-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Expand file tree
/
Copy pathtask_tuner.py
More file actions
246 lines (206 loc) · 9.34 KB
/
task_tuner.py
File metadata and controls
246 lines (206 loc) · 9.34 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
# -*- coding: utf-8 -*-
"""
.. _tuner:
Tuner
=================
AgentScope provides the ``tuner`` module for training agent applications using reinforcement learning (RL).
This tutorial will guide you through how to leverage the ``tuner`` module to improve agent performance on specific tasks, including:
- Introducing the core components of the ``tuner`` module
- Demonstrating the key code required for the tuning workflow
- Showing how to configure and run the tuning process
Main Components
~~~~~~~~~~~~~~~~~~~
The ``tuner`` module introduces three core components essential for RL-based agent training:
- **Task Dataset**: A collection of tasks for training and evaluating the agent.
- **Workflow Function**: Encapsulates the agent's logic to be tuned.
- **Judge Function**: Evaluates the agent's performance on tasks and provides reward signals for tuning.
In addition, ``tuner`` provides several configuration classes for customizing the tuning process, including:
- **TunerModelConfig**: Model configurations for tuning purposes.
- **AlgorithmConfig**: Specifies the RL algorithm (e.g., GRPO, PPO) and its parameters.
Implementation
~~~~~~~~~~~~~~~~~~~
This section demonstrates how to use ``tuner`` to train a simple math agent.
Task Dataset
--------------------
The task dataset contains tasks for training and evaluating your agent.
You dataset should follow the Huggingface `datasets <https://huggingface.co/docs/datasets/quickstart>`_ format, which can be loaded with ``datasets.load_dataset``. For example:
.. code-block:: text
my_dataset/
├── train.jsonl # training samples
└── test.jsonl # evaluation samples
Suppose your `train.jsonl` contains:
.. code-block:: json
{"question": "What is 2 + 2?", "answer": "4"}
{"question": "What is 4 + 4?", "answer": "8"}
Before starting tuning, you can verify that your dataset is loaded correctly with:
.. code-block:: python
from agentscope.tuner import DatasetConfig
dataset = DatasetConfig(path="my_dataset", split="train")
dataset.preview(n=2)
# Output the first two samples to verify correct loading
# [
# {
# "question": "What is 2 + 2?",
# "answer": "4"
# },
# {
# "question": "What is 4 + 4?",
# "answer": "8"
# }
# ]
Workflow Function
--------------------
The workflow function defines how the agent interacts with the environment and makes decisions. All workflow functions should follow the input/output signature defined in ``agentscope.tuner.WorkflowType``.
Below is an example workflow function using a ReAct agent to answer math questions:
"""
from agentscope.agent import ReActAgent
from agentscope.formatter import OpenAIChatFormatter
from agentscope.message import Msg
from agentscope.model import ChatModelBase
from agentscope.tuner import WorkflowOutput
async def example_workflow_function(
task: dict,
model: ChatModelBase,
auxiliary_models: dict[str, ChatModelBase] | None = None,
) -> WorkflowOutput:
"""An example workflow function for tuning.
Args:
task (`dict`): The task information.
model (`ChatModelBase`): The chat model used by the agent.
auxiliary_models (`dict[str, ChatModelBase] | None`): Additional
chat models, generally used to simulate the behavior of other
non-training agents in multi-agent scenarios.
Returns:
`WorkflowOutput`: The output generated by the workflow.
"""
agent = ReActAgent(
name="react_agent",
sys_prompt="You are a helpful math problem solving agent.",
model=model,
formatter=OpenAIChatFormatter(),
)
response = await agent.reply(
msg=Msg(
"user",
task["question"],
role="user",
), # extract question from task
)
return WorkflowOutput( # return the response
response=response,
)
# %%
# You can directly run this workflow function with a task dictionary and a ``DashScopeChatModel`` / ``OpenAIChatModel`` to test its correctness before formal training. For example:
import asyncio
import os
from agentscope.model import DashScopeChatModel
task = {"question": "What is 123 plus 456?", "answer": "579"}
model = DashScopeChatModel(
model_name="qwen-max",
api_key=os.environ["DASHSCOPE_API_KEY"],
)
workflow_output = asyncio.run(example_workflow_function(task, model))
assert isinstance(
workflow_output.response,
Msg,
), "In this example, the response should be a Msg instance."
print("\nWorkflow response:", workflow_output.response.get_text_content())
# %%
#
# Judge Function
# --------------------
# The judge function evaluates the agent's performance on a given task and provides a reward signal for tuning.
# All judge functions should follow the input/output signature defined in ``agentscope.tuner.JudgeType``.
# Below is a simple judge function that compares the agent's response with the ground truth answer:
from typing import Any
from agentscope.tuner import JudgeOutput
async def example_judge_function(
task: dict,
response: Any,
auxiliary_models: dict[str, ChatModelBase] | None = None,
) -> JudgeOutput:
"""A very simple judge function only for demonstration.
Args:
task (`dict`): The task information.
response (`Any`): The response field from the WorkflowOutput.
auxiliary_models (`dict[str, ChatModelBase] | None`): Additional
chat models for LLM-as-a-Judge purpose.
Returns:
`JudgeOutput`: The reward assigned by the judge.
"""
ground_truth = task["answer"]
reward = 1.0 if ground_truth in response.get_text_content() else 0.0
return JudgeOutput(reward=reward)
judge_output = asyncio.run(
example_judge_function(
task,
workflow_output.response,
),
)
print(f"Judge reward: {judge_output.reward}")
# %%
# The judge function can also be locally tested in the same way as shown above before formal training to ensure its logic is correct.
#
# .. tip::
# You can leverage existing `MetricBase <https://github.com/agentscope-ai/agentscope/blob/main/src/agentscope/evaluate/_metric_base.py>`_ implementations in your judge function to compute more sophisticated metrics and combine them into a composite reward.
#
# Configuration and Running
# ~~~~~~~~~~~~~~~
# Finally, you can configure and run the tuning process using the ``tuner`` module.
# Before starting, ensure that `Trinity-RFT <https://github.com/agentscope-ai/Trinity-RFT>`_ is installed in your environment, as it is required for tuning.
#
# Below is an example of configuring and starting the tuning process:
#
# .. note::
# This example is for demonstration only. For a complete runnable example, see `Tune ReActAgent <https://github.com/agentscope-ai/agentscope/tree/main/examples/tuner/model_tuning>`_
#
# .. code-block:: python
#
# from agentscope.tuner import tune, AlgorithmConfig, DatasetConfig, TunerModelConfig
# # your workflow / judge function here...
#
# if __name__ == "__main__":
# dataset = DatasetConfig(path="my_dataset", split="train")
# model = TunerModelConfig(model_path="Qwen/Qwen3-0.6B", max_model_len=16384)
# algorithm = AlgorithmConfig(
# algorithm_type="multi_step_grpo",
# group_size=8,
# batch_size=32,
# learning_rate=1e-6,
# )
# tune(
# workflow_func=example_workflow_function,
# judge_func=example_judge_function,
# model=model,
# train_dataset=dataset,
# algorithm=algorithm,
# )
#
# Here, ``DatasetConfig`` configures the training dataset, ``TunerModelConfig`` sets the parameters for the trainable model, and ``AlgorithmConfig`` specifies the reinforcement learning algorithm and its hyperparameters.
#
# .. tip::
# The ``tune`` function is based on `Trinity-RFT <https://github.com/agentscope-ai/Trinity-RFT>`_ and internally converts input parameters to a YAML configuration.
# Advanced users can skip the ``model``, ``train_dataset``, and ``algorithm`` arguments and instead provide a YAML config file path via the ``config_path`` argument.
# Using a configuration file is recommended for fine-grained control and to leverage advanced Trinity-RFT features. See the Trinity-RFT `Configuration Guide <https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html>`_ for more options.
#
# Save the above code as ``main.py`` and run it with:
#
# .. code-block:: bash
#
# ray start --head
# python main.py
#
# Checkpoints and logs are automatically saved to the ``checkpoints/AgentScope`` directory under your workspace, with each run in a timestamped sub-directory. Tensorboard logs can be found in ``monitor/tensorboard`` within the checkpoint directory.
#
# .. code-block:: text
#
# your_workspace/
# └── checkpoints/
# └──AgentScope/
# └── Experiment-20260104185355/ # each run saved in a sub-directory with timestamp
# ├── monitor/
# │ └── tensorboard/ # tensorboard logs
# └── global_step_x/ # saved model checkpoints at step x
#
# .. tip::
# For more tuning examples, refer to the `tuner directory <https://github.com/agentscope-ai/agentscope-samples/tree/main/tuner>`_ of the AgentScope-Samples repository.