Skip to content
Change the repository type filter

All

    Repositories list

    • harbor

      Public
      Harbor is a framework for running agent evaluations and creating and using RL environments.
      Python
      2303183386Updated Jan 10, 2026Jan 10, 2026
    • MDX
      4102Updated Jan 9, 2026Jan 9, 2026
    • 55601Updated Jan 9, 2026Jan 9, 2026
    • t-bench-docs

      Public
      TypeScript
      9423Updated Jan 6, 2026Jan 6, 2026
    • 0000Updated Dec 28, 2025Dec 28, 2025
    • A benchmark for LLMs on complicated tasks in the terminal
      Python
      4461.3k93164Updated Dec 26, 2025Dec 26, 2025
    • terminal-bench-experiments

      Public
      Jupyter Notebook
      4202Updated Dec 26, 2025Dec 26, 2025
    • terminal-bench-leaderboard

      Public
      A home for run logs for terminal-bench
      Shell
      221206Updated Nov 14, 2025Nov 14, 2025
    • terminal-bench-2

      Public
      Shell
      215159Updated Nov 12, 2025Nov 12, 2025
    • terminal-bench-lite

      Public
      Python
      0000Updated Nov 2, 2025Nov 2, 2025
    • 0000Updated Oct 9, 2025Oct 9, 2025
    • Python
      338024Updated Sep 11, 2025Sep 11, 2025