Skip to content

Refactor to breakout config from rest of code #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Sep 8, 2024
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
c9124b2
Extracted retrieve method
whitead Jun 20, 2024
b7574d0
Need to decide how to DRY this
whitead Jun 21, 2024
48efa06
Merge branch 'main' into issue-283
whitead Jun 25, 2024
b5a5b81
Merge branch 'september-2024-release' into issue-283
whitead Aug 29, 2024
a32fe4f
First draft of mapping
whitead Aug 30, 2024
6343cfd
Switched to new map function in gather evidence
whitead Aug 30, 2024
3ff2c4a
Reran pre-commit
whitead Aug 30, 2024
daecc4b
Merge branch 'september-2024-release' into issue-283
whitead Aug 30, 2024
02ba597
Stashing progress
whitead Aug 31, 2024
c3bd35c
Stashing progress
whitead Sep 1, 2024
d3706f4
Finished refactor
whitead Sep 3, 2024
d0fbdf5
Fixed ruff errors
whitead Sep 3, 2024
ad3a2da
Fixed all type hinting
whitead Sep 3, 2024
5aeb270
Made it possible to load named configs
whitead Sep 3, 2024
2038c37
Making progress on tests
whitead Sep 3, 2024
c2fe115
halfway through tests
whitead Sep 3, 2024
bc01d59
Added back all unit tests
whitead Sep 4, 2024
cf9f670
Fixed linting errors
whitead Sep 4, 2024
07d3cf8
Reenable CI
whitead Sep 4, 2024
20dd67e
Got indexes working again
whitead Sep 4, 2024
b183e2c
Stashing progress on agent rewrite
whitead Sep 5, 2024
3eb8e90
Agent tests finally pass
whitead Sep 5, 2024
c0783d3
Finished agent tests
whitead Sep 5, 2024
b2a9c8b
More work on tests
whitead Sep 5, 2024
e69ce1e
Merge branch september-2024-release into issue-283
mskarlin Sep 5, 2024
b9b66c9
removed unused imports and remove python label from docstring to avoi…
mskarlin Sep 5, 2024
ca66124
Rewrote CLI to use settings objects
whitead Sep 6, 2024
ac7833f
Stashing progress
whitead Sep 6, 2024
33a0d79
Moving to `uv` for installation/CI, parallel `pre-commit` in CI (#316)
jamesbraza Sep 6, 2024
b68fd91
Fixing `pybtex` import by requiring `setuptools` (#318)
jamesbraza Sep 6, 2024
0ed9c69
Fixing test installation in CI by specifying missing dependencies (#319)
jamesbraza Sep 6, 2024
9ec067e
Got CLI to work nicely
whitead Sep 6, 2024
2dcb659
Merge branch 'issue-283' of github.com:whitead/paper-qa into issue-283
whitead Sep 6, 2024
b8998c6
LiteLLM integration (#315)
mskarlin Sep 7, 2024
5ed57b4
Can now save and load settings
whitead Sep 7, 2024
e2aa2d0
Merge branch 'issue-283' of github.com:whitead/paper-qa into issue-283
whitead Sep 7, 2024
51499ad
Removed old CLI tests
whitead Sep 7, 2024
c3c314e
Fixed logging and tests
whitead Sep 8, 2024
2ac3f1e
Addressed some PR comments
whitead Sep 8, 2024
bc91c03
More PR Comments
whitead Sep 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 6 additions & 12 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,25 +2,19 @@ name: publish

on:
release:
types:
- created
types: [created]
workflow_dispatch:

jobs:
publish:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.11
cache: pip
- run: pip install .[agents,google,dev,llms]
- name: Set up uv
run: curl -LsSf https://astral.sh/uv/install.sh | sh
- run: uv sync
- name: Build a binary wheel and a source tarball
run: |
python -m build --sdist --wheel --outdir dist/ .
- name: Publish distribution 📦 to PyPI
uses: pypa/gh-action-pypi-publish@release/v1
run: uv run python -m build --sdist --wheel --outdir dist/ .
- uses: pypa/gh-action-pypi-publish@release/v1
with:
password: ${{ secrets.PYPI_API_TOKEN }}
34 changes: 20 additions & 14 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,34 +4,40 @@ on:
push:
branches: [main]
pull_request:
branches:
- main
- "**release**"

jobs:
test:
pre-commit:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request' # pre-commit-ci/lite-action only runs here
strategy:
matrix:
python-version: ["3.11"]

python-version: ["3.10", "3.12"] # Our min and max supported Python versions
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
cache: pip
- run: pip install .[agents,google,dev,llms]
- name: Check pre-commit
run: pre-commit run --all-files || ( git status --short ; git diff ; exit 1 )
- uses: pre-commit/[email protected]
- uses: pre-commit-ci/[email protected]
if: always()
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.10", "3.12"] # Our min and max supported Python versions
steps:
- uses: actions/checkout@v4
- name: Set up uv
run: |-
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python pin ${{ matrix.python-version }}
- run: uv sync --python-preference=only-managed
- uses: google-github-actions/auth@v2
with:
credentials_json: ${{ secrets.GOOGLE_CREDENTIALS }}
- name: Run Test
- run: uv run pytest
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
SEMANTIC_SCHOLAR_API_KEY: ${{ secrets.SEMANTIC_SCHOLAR_API_KEY }}
CROSSREF_API_KEY: ${{ secrets.CROSSREF_API_KEY }}
run: pytest
14 changes: 7 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ fabric.properties
!.vscode/launch.json
!.vscode/extensions.json
!.vscode/*.code-snippets
.vscode

# Local History for Visual Studio Code
.history/
Expand All @@ -114,7 +115,6 @@ fabric.properties
# Icon must end with two \r
Icon[\r]


# Thumbnails
._*

Expand Down Expand Up @@ -294,12 +294,12 @@ cython_debug/
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
.idea/

*.ipynb
env
# Version files made by setuptools_scm
**/version.py

# Matching pyproject.toml
paperqa/version.py
tests/example*
# Tests
tests/*txt
tests/*html
tests/test_index/*
42 changes: 25 additions & 17 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,28 +23,16 @@ repos:
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.6.2
rev: v0.6.4
hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 24.4.2
rev: 24.8.0
hooks:
- id: black
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.10.1
hooks:
- id: mypy
args: [--pretty, --ignore-missing-imports]
additional_dependencies:
- numpy
- openai>=1 # Match pyproject.toml
- pydantic~=2.0 # Match pyproject.toml
- types-requests
- types-setuptools
- types-PyYAML
- repo: https://github.com/rbubley/mirrors-prettier
rev: v3.3.2
rev: v3.3.3
hooks:
- id: prettier
- repo: https://github.com/pappasam/toml-sort
Expand All @@ -63,12 +51,32 @@ repos:
tests/stub_data.*
)$
- repo: https://github.com/abravalheri/validate-pyproject
rev: v0.18
rev: v0.19
hooks:
- id: validate-pyproject
additional_dependencies:
- "validate-pyproject-schema-store[all]>=2024.06.24" # Pin for Ruff's FURB154
- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.4.6
hooks:
- id: uv-lock
- repo: https://github.com/adamchainz/blacken-docs
rev: v1.12.1
rev: 1.18.0
hooks:
- id: blacken-docs
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.11.2
hooks:
- id: mypy
args: [--pretty, --ignore-missing-imports]
additional_dependencies:
- aiohttp
- httpx
- numpy
- openai>=1 # Match pyproject.toml
- pydantic~=2.0 # Match pyproject.toml
- tenacity
- tiktoken>=0.4.0 # Match pyproject.toml
- types-requests
- types-setuptools
- types-PyYAML
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
10 changes: 1 addition & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,6 @@ local_client = AsyncOpenAI(

docs = Docs(
client=local_client,
docs_index=NumpyVectorStore(embedding_model=LlamaEmbeddingModel()),
texts_index=NumpyVectorStore(embedding_model=LlamaEmbeddingModel()),
llm_model=OpenAILLMModel(
config=dict(
Expand Down Expand Up @@ -201,15 +200,12 @@ docs = Docs(embedding="text-embedding-3-large")
- `"hybrid-<model_name>"` i.e. `"hybrid-text-embedding-3-small"` to use a hybrid sparse keyword (based on a token modulo embedding) and dense vector embedding, any OpenAI or VoyageAI model can be used in the dense model name
- `"sparse"` to use a sparse keyword embedding only

For deeper embedding customization, embedding models and vector stores can be built separately and passed into the `Docs` object. Embedding models are used to create both paper-qa's index of document citation embedding vectors (`docs_index` argument) as well as the full-text embedding vectors (`texts_index` argument). They can both be specified as arguments when you create a new `Docs` object. You can use use any embedding model which implements paper-qa's `EmbeddingModel` class. For example, to use `text-embedding-3-large`:
For deeper embedding customization, embedding models and vector stores can be built separately and passed into the `Docs` object. Embedding models are used to create paper-qa's index of the full-text embedding vectors (`texts_index` argument). They can both be specified as arguments when you create a new `Docs` object. You can use use any embedding model which implements paper-qa's `EmbeddingModel` class. For example, to use `text-embedding-3-large`:

```python
from paperqa import Docs, NumpyVectorStore, OpenAIEmbeddingModel

docs = Docs(
docs_index=NumpyVectorStore(
embedding_model=OpenAIEmbeddingModel(name="text-embedding-3-large")
),
texts_index=NumpyVectorStore(
embedding_model=OpenAIEmbeddingModel(name="text-embedding-3-large")
),
Expand All @@ -224,7 +220,6 @@ from langchain_openai import OpenAIEmbeddings
from paperqa import Docs, LangchainVectorStore

docs = Docs(
docs_index=LangchainVectorStore(cls=FAISS, embedding_model=OpenAIEmbeddings()),
texts_index=LangchainVectorStore(cls=FAISS, embedding_model=OpenAIEmbeddings()),
)
```
Expand All @@ -243,7 +238,6 @@ local_client = AsyncOpenAI(

docs = Docs(
client=local_client,
docs_index=NumpyVectorStore(embedding_model=SentenceTransformerEmbeddingModel()),
texts_index=NumpyVectorStore(embedding_model=SentenceTransformerEmbeddingModel()),
llm_model=OpenAILLMModel(
config=dict(
Expand All @@ -260,7 +254,6 @@ from paperqa import Docs, HybridEmbeddingModel, SparseEmbeddingModel, NumpyVecto

model = HybridEmbeddingModel(models=[OpenAIEmbeddingModel(), SparseEmbeddingModel()])
docs = Docs(
docs_index=NumpyVectorStore(embedding_model=model),
texts_index=NumpyVectorStore(embedding_model=model),
)
```
Expand Down Expand Up @@ -318,7 +311,6 @@ from langchain_openai import OpenAIEmbeddings

docs = Docs(
texts_index=LangchainVectorStore(cls=FAISS, embedding_model=OpenAIEmbeddings()),
docs_index=LangchainVectorStore(cls=FAISS, embedding_model=OpenAIEmbeddings()),
)
```

Expand Down
30 changes: 9 additions & 21 deletions paperqa/__init__.py
Original file line number Diff line number Diff line change
@@ -1,29 +1,22 @@
from .docs import Answer, Context, Doc, Docs, PromptCollection, Text, print_callback
from .config import Settings, get_settings
from .docs import Answer, Docs, print_callback
from .llms import (
AnthropicLLMModel,
EmbeddingModel,
HybridEmbeddingModel,
LangchainEmbeddingModel,
LangchainLLMModel,
LangchainVectorStore,
LlamaEmbeddingModel,
LiteLLMEmbeddingModel,
LiteLLMModel,
LLMModel,
LLMResult,
NumpyVectorStore,
OpenAIEmbeddingModel,
OpenAILLMModel,
SentenceTransformerEmbeddingModel,
SparseEmbeddingModel,
embedding_model_factory,
llm_model_factory,
vector_store_factory,
)
from .types import DocDetails
from .types import Context, Doc, DocDetails, Text
from .version import __version__

__all__ = [
"Answer",
"AnthropicLLMModel",
"Context",
"Doc",
"DocDetails",
Expand All @@ -32,20 +25,15 @@
"HybridEmbeddingModel",
"LLMModel",
"LLMResult",
"LangchainEmbeddingModel",
"LangchainLLMModel",
"LangchainVectorStore",
"LlamaEmbeddingModel",
"LiteLLMEmbeddingModel",
"LiteLLMModel",
"NumpyVectorStore",
"OpenAIEmbeddingModel",
"OpenAILLMModel",
"PromptCollection",
"SentenceTransformerEmbeddingModel",
"Settings",
"SparseEmbeddingModel",
"Text",
"__version__",
"embedding_model_factory",
"llm_model_factory",
"get_settings",
"print_callback",
"vector_store_factory",
]
Loading
Loading