Skip to content

Recursive directory indexing; catch indexing errors #312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

sidnarayanan
Copy link
Contributor

Opening this PR into test-md5 to make the diff easier to read. Will rebase + point to main once #311 is merged.

  • Added get_directory_index(recursive: bool=True) to recursively index a directory
  • If S2 doesn't have journal data, it can either skip the field or return None. Extended to handling both cases
  • Raise an ImpossibleParsingError if no text is parsed from PDF

Also lots of whitespace changes from ruff and black - they're conflicting with each other in some places.

mskarlin and others added 6 commits August 29, 2024 11:24
* add agentic workflow module and cli

* update pyproject.toml and split google into its own import

* remove unused import

* update default search chain

* Update __init__.py

* Update search.py

* move reqs into pyproject.toml, addressing PR comments

* rename search_chain to openai_get_search_query

* remove get_llm_name

* move table_formatter to helpers.py

* Update paperqa/agents/main.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/main.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/main.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/docs.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/types.py

Co-authored-by: James Braza <[email protected]>

* rename compute_cost to compute_total_model_token_cost

* remove stream_answer

* rename to stub_manifest, and use Path for all paths

* Update paperqa/llms.py

Co-authored-by: James Braza <[email protected]>

* move SKIP_AGENT_TESTS = False

* nix _ = assignments

* add test comments

* types in conftest.py

* split libs into llms

* link openai chat timeout to query.timeout

* Update paperqa/agents/__init__.py

Co-authored-by: James Braza <[email protected]>

* logging revamp and renaming

* Update tests/test_cli.py

Co-authored-by: James Braza <[email protected]>

* Update tests/test_cli.py

Co-authored-by: James Braza <[email protected]>

* move vertex import to func call, add docstring to SupportsPickle

* docstring

* remove _ =

* remove bool return type from set

* update gitignore

* add config attribute to baase LLMModel class

* replace get_current_settings -> get_settings

* replace get_current_settings -> get_settings

* PR simplifications

* remove all stream_* functions

* avoid modifying the root logger

* re-organize logger import location

* move hashlib into utils

* refactor strip_answer into Answer object

* label circular imports

* ensure absolute paths are used in index name

* limit select to be used only when DOI is not present in crossref

* Update paperqa/agents/search.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/search.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/search.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/search.py

Co-authored-by: James Braza <[email protected]>

* Update paperqa/agents/models.py

Co-authored-by: James Braza <[email protected]>

* reconfigure logging to not prevent propagation

* remove newlines in the current year

* use required fields as a subset

* replace . with Path.cwd()

---------

Co-authored-by: James Braza <[email protected]>
@whitead
Copy link
Collaborator

whitead commented Sep 9, 2024

Is this relevant @sidnarayanan ? The base PR is dead

@sidnarayanan
Copy link
Contributor Author

Oh if #311 is merged, this should be good to go. I can rebase off main. @mskarlin what do you think?

@sidnarayanan sidnarayanan changed the base branch from test-md5 to main September 9, 2024 23:18
@sidnarayanan
Copy link
Contributor Author

Oh boy, this has really drifted from main. I'm going to close this PR and open a new one

@jamesbraza jamesbraza deleted the recurse-directories branch September 10, 2024 16:00
@sidnarayanan sidnarayanan restored the recurse-directories branch September 10, 2024 16:43
@jamesbraza jamesbraza deleted the recurse-directories branch October 7, 2024 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants