Name	Name	Last commit message	Last commit date
parent directory ..
fixtures	fixtures
shared	shared
typescript/src	typescript/src
.gitignore	.gitignore
README.md	README.md
run.sh	run.sh

Voicebench

voicebench benchmarks end-to-end voice latency with Eliza using the TypeScript runtime.

What It Measures

For each runtime and each mode (simple, non-simple):

transcription time (TRANSCRIPTION model)
transcription accuracy against labels (when a dataset manifest includes expected text)
response TTFT (time to first response token/chunk; falls back to response completion when streaming is unavailable)
response total time
speech-to-response-start (transcriptionMs + responseTtftMs)
speech-to-voice-start (transcriptionMs + responseTotalMs + firstSentenceTtsMs) for cached and uncached first sentence paths
voice generation time (TEXT_TO_SPEECH model)
voice first-token proxy (first-sentence synthesis) in two paths:
- uncached first sentence
- cached first sentence while synthesizing remainder in parallel
end-to-end time
p95/p99 latency tails (transcription, response TTFT/total, TTS, voice TTFT, cached pipeline, end-to-end)
in-context and out-context excerpts
model input/output excerpts from trajectory logs (raw vs cleaned)
thinking/XML tag detection counts on model raw output
trajectory counts (provider accesses + LLM calls)

Modes

simple: normal path, no benchmark context injected
non-simple: injects benchmarkContext metadata so CONTEXT_BENCH forces the non-simple action loop

Provider Profiles

groq: Groq for transcription + response models + voice generation
elevenlabs: Groq for response models, ElevenLabs for transcription + voice generation
mock: deterministic in-process model, transcription, and TTS handlers for smoke tests without external credentials

Required Environment

Common:

VOICEBENCH_AUDIO_PATH (optional; if unset, run.sh will try these defaults in order):
- benchmarks/voicebench/shared/audio/default.wav
- examples/town/public/assets/background.mp3
- agent-town/public/assets/background.mp3
- run.sh resolves the selected path to an absolute path before invoking the TypeScript runner
- mock profile additionally falls back to benchmarks/voicebench/shared/mock-audio.txt

Mock profile:

no external credentials

Groq profile:

GROQ_API_KEY
GROQ_LARGE_MODEL (optional; default: openai/gpt-oss-120b)
GROQ_SMALL_MODEL (optional; default: openai/gpt-oss-120b)
GROQ_TRANSCRIPTION_MODEL (optional; default: whisper-large-v3-turbo)
GROQ_TTS_MODEL (optional; default: canopylabs/orpheus-v1-english)
GROQ_TTS_VOICE (optional; default: troy)
GROQ_TTS_RESPONSE_FORMAT (optional; default: wav)

ElevenLabs profile:

GROQ_API_KEY
ELEVENLABS_API_KEY
ELEVENLABS_MODEL_ID (optional; default in run.sh: eleven_flash_v2_5)
ELEVENLABS_VOICE_ID (optional; default in run.sh: EXAVITQu4vr4xnSDxMaL)
ELEVENLABS_OPTIMIZE_STREAMING_LATENCY (optional; default in run.sh: 4)
ELEVENLABS_OUTPUT_FORMAT (optional; default in run.sh: mp3_22050_32)

Run

cd benchmarks/voicebench
./run.sh --profile=mock --iterations=1 --dataset=fixtures/manifest-mock.json
./run.sh --profile=groq
./run.sh --profile=elevenlabs

Run benchmark against labeled dataset:

cd benchmarks/voicebench
./run.sh --profile=groq --dataset=fixtures/manifest-groq.json
./run.sh --profile=elevenlabs --dataset=fixtures/manifest-elevenlabs.json

Optional flags:

--iterations=N (default from shared/config.json)
--ts-only (no-op; only TypeScript runs). --py-only / --rs-only exit with an error.
--output-dir=/absolute/or/relative/path
--dataset=/path/to/manifest.json (uses fixture samples instead of a single VOICEBENCH_AUDIO_PATH)

Results are written as JSON in benchmarks/voicebench/results/.

Notes

Fixture prompts live in benchmarks/voicebench/shared/fixture_prompts.jsonl.
Response verbosity is hard-capped via responseMaxChars in benchmarks/voicebench/shared/config.json.
Fixture manifests include samples[].id, samples[].text, and samples[].audioPath.
TypeScript runner dynamically imports plugin packages from:
- plugins/plugin-groq
- plugins/plugin-elevenlabs
If Bun reports missing plugin dependencies, install those plugin dependencies first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Voicebench

What It Measures

Modes

Provider Profiles

Required Environment

Run

Notes

FilesExpand file tree

voicebench

Directory actions

More options

Directory actions

More options

Latest commit

History

voicebench

Folders and files

parent directory

README.md

Voicebench

What It Measures

Modes

Provider Profiles

Required Environment

Run

Notes