Aviary agent `max_timesteps` and fixed `test_gather_evidence_rejects_empty_docs` #515

jamesbraza · 2024-10-02T19:06:26Z

Since its addition in #309, test_gather_evidence_rejects_empty_docs was failing for the wrong reason. It was supposed to fail due to empty docs, but it was failing due to a bad patch:

ERROR    paperqa.agents.main:main.py:237 Agent <aviary.tools.utils.ToolSelector object at 0x10d5a73b0> failed.
Traceback (most recent call last):
  File "/path/to/paper-qa/paperqa/agents/main.py", line 194, in run_aviary_agent
    obs, tools = await env.reset()
                 ^^^^^^^^^^^^^^^^^
  File "/path/to/paper-qa/paperqa/agents/env.py", line 136, in reset
    self.state, self.tools = self.make_initial_state_and_tools()
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/path/to/paper-qa/paperqa/agents/env.py", line 123, in make_initial_state_and_tools
    self.tools = settings_to_tools(
                 ^^^^^^^^^^^^^^^^^^
  File "/path/to/paper-qa/paperqa/agents/env.py", line 71, in settings_to_tools
    tool = Tool.from_function(
           ^^^^^^^^^^^^^^^^^^^
  File "/path/to/.venv/lib/python3.12/site-packages/aviary/tools/base.py", line 339, in from_function
    raise ValueError(f"Missing description for parameter {pname}.")
ValueError: Missing description for parameter args.

This PR:

Properly propagates the __doc__ onto the patched gen_answer, so Tool.from_function succeeds and the test fails as expected
Implements max_timesteps so the test doesn't cycle forever until it times out
- Which partly resolves Moving timeout and max steps into PaperQAEnvironment itself #478
Moves from AgentStatus.TIMEOUT to a more general AgentStatus.TRUNCATED to match aviary's general terminology

…sting

mskarlin

looks good to me -- couple q's on naming

paperqa/agents/main.py

mskarlin · 2024-10-02T19:22:26Z

paperqa/agents/main.py

@@ -140,6 +143,11 @@ async def run_fake_agent(
    ) = None,
    **env_kwargs,
 ) -> tuple[Answer, AgentStatus]:
+    if query.settings.agent.max_timesteps is not None:


jc -- why are we calling the agent steps timesteps? I think of timesteps from physical models (i.e. molecular dynamics) where each iteration is a unit of time, like 4 picoseconds or something. This is more like actionsteps in my mind.

I hear what you're saying.

I think one can also encounter vagaries with "step", for example one can wonder does "step" mean:

One step = agent selection + environment step

Two steps = agent selection + environment step

I went with timestep to exactly match ldp.data_structures.Transition.timestep

Let me know if you think we should name it otherwise

jamesbraza added 2 commits October 1, 2024 18:21

Fixed the failure cause in test_gather_evidence_rejects_empty_docs

6413928

Added max timesteps to AgentSettings, and configured in agents and te…

01b4ece

…sting

jamesbraza added the enhancement New feature or request label Oct 2, 2024

jamesbraza requested review from whitead, mskarlin and nadolskit October 2, 2024 19:06

jamesbraza self-assigned this Oct 2, 2024

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. bug Something isn't working labels Oct 2, 2024

jamesbraza force-pushed the correct-failure-empty-docs branch from 4867fcf to 18842d0 Compare October 2, 2024 19:07

Moved from TIMEOUT to TRUNCATED (more general)

7151952

jamesbraza force-pushed the correct-failure-empty-docs branch from 18842d0 to 7151952 Compare October 2, 2024 19:14

mskarlin approved these changes Oct 2, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 2, 2024

jamesbraza merged commit 4fc6138 into main Oct 2, 2024
5 checks passed

jamesbraza deleted the correct-failure-empty-docs branch October 2, 2024 19:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Aviary agent `max_timesteps` and fixed `test_gather_evidence_rejects_empty_docs` #515

Aviary agent `max_timesteps` and fixed `test_gather_evidence_rejects_empty_docs` #515

Uh oh!

jamesbraza commented Oct 2, 2024 •

edited

Loading

Uh oh!

mskarlin left a comment

Uh oh!

Uh oh!

mskarlin Oct 2, 2024

Uh oh!

jamesbraza Oct 2, 2024

Uh oh!

jamesbraza Oct 2, 2024

Uh oh!

Uh oh!

Uh oh!

Aviary agent max_timesteps and fixed test_gather_evidence_rejects_empty_docs #515

Aviary agent max_timesteps and fixed test_gather_evidence_rejects_empty_docs #515

Uh oh!

Conversation

jamesbraza commented Oct 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mskarlin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mskarlin Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

jamesbraza Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

jamesbraza Oct 2, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Aviary agent `max_timesteps` and fixed `test_gather_evidence_rejects_empty_docs` #515

Aviary agent `max_timesteps` and fixed `test_gather_evidence_rejects_empty_docs` #515

jamesbraza commented Oct 2, 2024 •

edited

Loading