Skip to content

Removed "cannot answer" literals and added reset tool #698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Nov 19, 2024

Conversation

jamesbraza
Copy link
Collaborator

@jamesbraza jamesbraza commented Nov 18, 2024

There many of places where we depended on the string literal "cannot answer" in the qa prompt, mainly the environment being done (prior to #684) or the answer being considered unsure.

This environment check of "cannot answer" also has some downsides:

  1. A coupling of environment functionality to a caller-specified qa prompt
    • We don't validate that "cannot answer" is present in the qa prompt
  2. Added statefulness to our environment (checks for a string literal in answer)

So, what is "unsure"? Really it should be:

  1. gen_answer tool call updates the answer
  2. Given the answer, the agent (and not the environment's gen_answer tool) decides if an answer was successful
  3. If not successful, agent keeps trying to get better evidence, until it gives up

To resolve this, we moved the unsure call directly to the complete tool. Now:

  • When unsure: agent just keeps running
  • When finally sure: agent calls complete(has_successful_answer=True)
  • If giving up: agent calls complete(has_successful_answer=False)

The "cannot answer" check was mostly easy to remove, other than the AnswerSettings.wipe_context_on_answer_failure, since we no longer have a way of checking unsure within gen_answer.

Since the agent controls unsureness now, we needed to make a new tool: reset, which basically performs the use case of wipe_context_on_answer_failure.


After this PR, we have:

  • Removed dependence on "cannot answer" string literal
  • Deprecates AnswerSettings.wipe_context_on_answer_failure
  • Agent defines unsureness, not the output of the environment's gen_answer tool
  • A "learnable" dimension, the agent controlling wiping contexts

@jamesbraza jamesbraza added the enhancement New feature or request label Nov 18, 2024
@jamesbraza jamesbraza self-assigned this Nov 18, 2024
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 18, 2024
@@ -49,7 +50,7 @@ def settings_to_tools(
embedding_model = embedding_model or settings.get_embedding_model()
tools: list[Tool] = []
for tool_type in (
(PaperSearch, GatherEvidence, GenerateAnswer, Complete)
(PaperSearch, GatherEvidence, GenerateAnswer, Reset, Complete)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to update the tool constructor in the server too.

cc. @nadolskit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep agreed

qa_prompt = (
"Answer the question below with the context.\n\n"
"Context (with relevance scores):\n\n{context}\n\n----\n\n"
"Question: {question}\n\n"
"Write an answer based on the context. "
"If the context provides insufficient information reply "
'"I cannot answer."'
f'"{CANNOT_ANSWER_PHRASE}." '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 19, 2024
@jamesbraza jamesbraza force-pushed the removing-cannot-answer branch from 8b6aa61 to 2e3edb3 Compare November 19, 2024 18:19
@jamesbraza jamesbraza merged commit 201d364 into main Nov 19, 2024
5 checks passed
@jamesbraza jamesbraza deleted the removing-cannot-answer branch November 19, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants