Skip to content

fix: make SWEbench live progress script restart-safe#872

Draft
piojanu wants to merge 1 commit intomainfrom
fix/swebench-progress-dedup
Draft

fix: make SWEbench live progress script restart-safe#872
piojanu wants to merge 1 commit intomainfrom
fix/swebench-progress-dedup

Conversation

@piojanu
Copy link
Copy Markdown
Contributor

@piojanu piojanu commented Mar 19, 2026

Summary

  • The tasks.jsonl progress tracking script counted raw lines, giving inflated totals (>500 for a 500-task benchmark) after SLURM wall-time restarts
  • Updated the script to deduplicate by task_id (last entry wins) so progress is accurate across multiple restarts
  • Added documentation explaining the append-only behavior of tasks.jsonl and when to use each progress file

Test plan

  • Run SWEbench eval, check progress mid-run with the new script
  • Kill and restart the eval, verify progress script still shows correct unique count (≤500)
  • Confirm previously-errored instances that succeed on retry show as success

🤖 Generated with Claude Code

The tasks.jsonl progress tracking script counted raw lines, giving
inflated totals (>500) after SLURM restarts. Deduplicate by task_id
(last entry wins) so progress is accurate across multiple restarts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Piotr Januszewski <pjanuszewski@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 19, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant