-
Notifications
You must be signed in to change notification settings - Fork 749
Aviary agent max_timesteps
and fixed test_gather_evidence_rejects_empty_docs
#515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4867fcf
to
18842d0
Compare
18842d0
to
7151952
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me -- couple q's on naming
@@ -140,6 +143,11 @@ async def run_fake_agent( | |||
) = None, | |||
**env_kwargs, | |||
) -> tuple[Answer, AgentStatus]: | |||
if query.settings.agent.max_timesteps is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jc -- why are we calling the agent steps timesteps? I think of timesteps from physical models (i.e. molecular dynamics) where each iteration is a unit of time, like 4 picoseconds or something. This is more like actionsteps in my mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear what you're saying.
I think one can also encounter vagaries with "step", for example one can wonder does "step" mean:
- One step = agent selection + environment step
- Two steps = agent selection + environment step
I went with timestep to exactly match ldp.data_structures.Transition.timestep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you think we should name it otherwise
Since its addition in #309,
test_gather_evidence_rejects_empty_docs
was failing for the wrong reason. It was supposed to fail due to empty docs, but it was failing due to a badpatch
:This PR:
__doc__
onto the patchedgen_answer
, soTool.from_function
succeeds and the test fails as expectedmax_timesteps
so the test doesn't cycle forever until it times outPaperQAEnvironment
itself #478AgentStatus.TIMEOUT
to a more generalAgentStatus.TRUNCATED
to match aviary's general terminology