Skip to content

Aviary agent max_timesteps and fixed test_gather_evidence_rejects_empty_docs #515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 2, 2024

Conversation

jamesbraza
Copy link
Collaborator

@jamesbraza jamesbraza commented Oct 2, 2024

Since its addition in #309, test_gather_evidence_rejects_empty_docs was failing for the wrong reason. It was supposed to fail due to empty docs, but it was failing due to a bad patch:

ERROR    paperqa.agents.main:main.py:237 Agent <aviary.tools.utils.ToolSelector object at 0x10d5a73b0> failed.
Traceback (most recent call last):
  File "/path/to/paper-qa/paperqa/agents/main.py", line 194, in run_aviary_agent
    obs, tools = await env.reset()
                 ^^^^^^^^^^^^^^^^^
  File "/path/to/paper-qa/paperqa/agents/env.py", line 136, in reset
    self.state, self.tools = self.make_initial_state_and_tools()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/paper-qa/paperqa/agents/env.py", line 123, in make_initial_state_and_tools
    self.tools = settings_to_tools(
                 ^^^^^^^^^^^^^^^^^^
  File "/path/to/paper-qa/paperqa/agents/env.py", line 71, in settings_to_tools
    tool = Tool.from_function(
           ^^^^^^^^^^^^^^^^^^^
  File "/path/to/.venv/lib/python3.12/site-packages/aviary/tools/base.py", line 339, in from_function
    raise ValueError(f"Missing description for parameter {pname}.")
ValueError: Missing description for parameter args.

This PR:

  • Properly propagates the __doc__ onto the patched gen_answer, so Tool.from_function succeeds and the test fails as expected
  • Implements max_timesteps so the test doesn't cycle forever until it times out
  • Moves from AgentStatus.TIMEOUT to a more general AgentStatus.TRUNCATED to match aviary's general terminology

@jamesbraza jamesbraza added the enhancement New feature or request label Oct 2, 2024
@jamesbraza jamesbraza self-assigned this Oct 2, 2024
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Oct 2, 2024
@jamesbraza jamesbraza force-pushed the correct-failure-empty-docs branch from 4867fcf to 18842d0 Compare October 2, 2024 19:07
@jamesbraza jamesbraza force-pushed the correct-failure-empty-docs branch from 18842d0 to 7151952 Compare October 2, 2024 19:14
Copy link
Collaborator

@mskarlin mskarlin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me -- couple q's on naming

@@ -140,6 +143,11 @@ async def run_fake_agent(
) = None,
**env_kwargs,
) -> tuple[Answer, AgentStatus]:
if query.settings.agent.max_timesteps is not None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jc -- why are we calling the agent steps timesteps? I think of timesteps from physical models (i.e. molecular dynamics) where each iteration is a unit of time, like 4 picoseconds or something. This is more like actionsteps in my mind.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear what you're saying.

I think one can also encounter vagaries with "step", for example one can wonder does "step" mean:

  • One step = agent selection + environment step
  • Two steps = agent selection + environment step

I went with timestep to exactly match ldp.data_structures.Transition.timestep

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you think we should name it otherwise

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 2, 2024
@jamesbraza jamesbraza merged commit 4fc6138 into main Oct 2, 2024
5 checks passed
@jamesbraza jamesbraza deleted the correct-failure-empty-docs branch October 2, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request lgtm This PR has been approved by a maintainer size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Moving timeout and max steps into PaperQAEnvironment itself
2 participants