This repository contains the code to:
- collect realistic Todoist request/response trajectories via AppWorld + OpenAI models,
- train a causal language model to predict the next Todoist API response given a history of req/res pairs,
- evaluate both trained models and ChatGPT-family models on the same trajectories.
It uses Hydra for configuration, Hugging Face Transformers for training (optionally with LoRA), and scripts for generation and evaluation.
cd /home/georgy/repos/appsim
conda env create -f env.yml -n appsim
conda activate appsim
pip install -r requirements.txt
pip install -e .Environment:
- Set your OpenAI API key:
export OPENAI_API_KEY=...
Trajectories are plain-text files with alternating lines of req: and res::
req:create_project(name='Alpha', color='red', description='...', is_favorite=False)
res:{'message': 'Project created.', 'project_id': 280}
req:show_project(project_id=280)
res:{'name': 'Alpha', 'color': 'red', ... 'project_id': 280, ...}
Models are trained to predict each next res: given the history so far.
python scripts/collect_todoist_data.py \
--save_dir /abs/path/to/data/todoist/raw \
--num_trajectories 50 \
--trajectory_base_length 50 \
--max_appworld_retry 7 \
--start_trajectory_id 0 \
--chatgpt_model gpt-4o-mini \
--openai_api_key "$OPENAI_API_KEY"Outputs (per --save_dir):
trajectory_0.txt,trajectory_1.txt, ...stats/trajectory_0.json(aggregate counts)state/trajectory_0.json(final state snapshot)
python src/main \
experiment_name=my_run \
common.project_storage_base_path=/abs/path/for/outputs \
dataset.path=/abs/path/to/data/todoistpython scripts/trained_model_generate.py \
--trajectory_path /abs/path/to/data/todoist/test/trajectory_0.txt \
--checkpoint_path /abs/path/for/outputs/runs/<run_name>/checkpoint-<step> \
--temperature 0.1 \
--max_new_tokens 128 \
--output_dir output/python scripts/chatgpt_generate.py \
--trajectory_path /abs/path/to/data/todoist/test/trajectory_0.txt \
--chatgpt_model gpt-4.1 \
--temperature 0.1 \
--output_dir output/ \
--openai_api_key "$OPENAI_API_KEY"python scripts/evaluate_trajectory.py \
--generated_trajectory_path output/gpt-4.1-temp_0.1/trajectory_0.txt \
--gt_trajectory_path /abs/path/to/data/todoist/test/trajectory_0.txt