Skip to content

Agentic workflows, locally indexed search, and CLI #309

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 61 commits into from
Aug 29, 2024

Conversation

mskarlin
Copy link
Collaborator

@mskarlin mskarlin commented Aug 27, 2024

Description is still a WIP

This PR merges agentic workflows (via an OpenAI-based agent and a 'fake' agent) into a new branch to prep for release. The agent allows PQA to use a "Search" tool via a local full text index of files, a "Gather Evidence" tool, and a "Generate Answer" tool. Another new included feature is a CLI tool which allows PaperQA to be used in its entirety from the command line. Papers are chosen to be added into the Docs objects via full-text search of indexed directories on your machine. After installing via pip install paper-qa[agents] (or if local, pip install -e .[agents]), you can navigate to a directory with some PDF/TXT/HTML documents in it and run:

pqa ask "what is pyroptosis?"

Which, for the first time you run, an index will be created, but further runs will use that same index. If you change some PaperQA settings which would invalidate your current index (like changing the chunksize or your target directory), PaperQA will automatically create a new index for you.

New options can be set like the following (with a relative directory):

pqa set 'agent_tools.paper_directory' 'papers'

and then be viewed similarly

pqa show 'agent_tools.paper_directory'

Or to see everything you've assigned:

pqa show all 

All the settings available to pqa can be found in the QueryRequest object within agents/models/, behind the scenes PaperQA will store a configuration yaml in your (by default) ~/.pqa, folder but this can be changed via setting PQA_HOME.

Additional new stuff:

  • After using the client library more heavily to query across 100s of papers, I implemented some minor bugfixes, like stripping https://doi.org/ out of DOIs which often shows up when they are extracted by LLMs
  • Optional install for a 'google' environment that includes vertex. It's a crazy large import that takes ~2-3 seconds on its own, so it's best to leave optional.

@mskarlin mskarlin added the enhancement New feature or request label Aug 27, 2024
@mskarlin mskarlin requested review from whitead and jamesbraza August 27, 2024 00:59
Copy link
Collaborator

@jamesbraza jamesbraza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge battle @mskarlin here wow. Amazing work

Copy link
Collaborator

@whitead whitead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our discussion - LGTM with removal of streaming functions

@mskarlin mskarlin merged commit 9cd976a into september-2024-release Aug 29, 2024
@mskarlin mskarlin deleted the agentic-workflows branch August 29, 2024 18:24
@whitead whitead mentioned this pull request Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants