Build voice assistants using VAPI with Speechmatics as the transcription provider.
- Creating VAPI assistants with Speechmatics transcriber
- Configuring transcription settings (language, operating point, diarization)
- Managing assistants via the VAPI Server SDK
- VAPI Account: Sign up at vapi.ai
- VAPI Private Key: Get from dashboard.vapi.ai
- Speechmatics API Key: Add to VAPI Dashboard under Provider Keys
- Docker (recommended) or Python 3.9+
cp .env.example .env
# Edit .env and add your VAPI_API_KEY (private key)Using Docker (recommended):
docker compose build --no-cache
docker compose run --rm vapi python main.pyUsing Python directly:
cd python
python -m venv .venv
#source .venv/bin/activate
Windows: .venv\Scripts\activate
pip install -r requirements.txt
python main.pyAfter creating an assistant, test it in the VAPI Dashboard:
- Go to Assistants
- Select your assistant
- Click Test to start a voice call
# Create a new assistant with Speechmatics STT
docker compose run --rm vapi python main.py
# List all assistants
docker compose run --rm vapi python main.py list
# Get assistant details
docker compose run --rm vapi python main.py get <assistant-id>
# Delete an assistant
docker compose run --rm vapi python main.py delete <assistant-id>flowchart LR
subgraph Client
USER[User Browser/Phone]
end
subgraph VAPI Platform
ROOM[VAPI Call]
STT[Speechmatics STT]
LLM[OpenAI LLM]
TTS[ElevenLabs TTS]
end
USER <-->|WebRTC| ROOM
ROOM --> STT
STT --> LLM
LLM --> TTS
TTS --> ROOM
- User speaks into browser or phone via VAPI Dashboard
- VAPI Call handles WebRTC connection and audio streaming
- Speechmatics STT transcribes speech to text with diarization
- OpenAI LLM generates intelligent responses
- ElevenLabs TTS converts text to natural speech
- Audio streams back to user in real-time
from vapi import Vapi
from vapi.types import SpeechmaticsTranscriber, OpenAiModel, ElevenLabsVoice
client = Vapi(token=vapi_token)
assistant = client.assistants.create(
name="Speechmatics Assistant",
transcriber=SpeechmaticsTranscriber(
provider="speechmatics",
model="default",
language="en",
operating_point="enhanced",
region="us",
enable_diarization=True,
max_speakers=2,
speaker_labels=["SuperAgent", "Client"],
enable_partials=True,
enable_punctuation=True,
enable_capitalization=True,
remove_disfluencies=True,
end_of_turn_sensitivity=0.5,
custom_vocabulary=[
{"content": "Speechmatics", "sounds_like": ["speech matics", "speech mattics"]},
{"content": "Vapi", "sounds_like": ["vappy", "vahpee", "vaypee", "v a p i", "vap ee"]},
],
),
model=OpenAiModel(
provider="openai",
model="gpt-4o-mini",
messages=[{"role": "system", "content": "You are a helpful voice assistant. Keep responses brief and conversational."}],
),
voice=ElevenLabsVoice(provider="11labs", voice_id="21m00Tcm4TlvDq8ikWAM"),
first_message="Hello! How can I help you today?",
end_call_message="Goodbye!",
)| Parameter | Type | Description |
|---|---|---|
provider |
string | Must be "speechmatics" |
model |
string | "default" |
language |
string | ISO code: "en", "es", "fr", etc. (50+ languages) |
operating_point |
string | "standard" or "enhanced" (best accuracy) |
region |
string | "eu" or "us" (choose closest to users) |
| Parameter | Type | Description |
|---|---|---|
enable_diarization |
boolean | Enable speaker identification |
max_speakers |
number | Limits the number of distinct speakers identified |
speaker_labels |
array | Custom labels for speakers |
| Parameter | Type | Description |
|---|---|---|
enable_partials |
boolean | Stream partial/interim transcripts |
max_delay |
number | Max delay in ms (lower = faster, less accurate) |
enable_punctuation |
boolean | Auto-add punctuation |
enable_capitalization |
boolean | Auto-capitalize |
remove_disfluencies |
boolean | Remove "um", "uh", etc. |
numeral_style |
string | "written" or "spoken" |
enable_entities |
boolean | Entity recognition |
| Parameter | Type | Description |
|---|---|---|
end_of_turn_sensitivity |
number | Sensitivity for detecting end of speech (0.0-1.0) |
minimum_speech_duration |
number | Min speech duration threshold |
| Parameter | Type | Description |
|---|---|---|
custom_vocabulary |
array | List of custom words with optional sounds_like |
from vapi.types import SpeechmaticsTranscriber
transcriber = SpeechmaticsTranscriber(
provider="speechmatics",
model="default",
language="en",
operating_point="enhanced",
region="us",
enable_diarization=True,
max_speakers=2,
speaker_labels=["SuperAgent", "Client"],
enable_partials=True,
enable_punctuation=True,
enable_capitalization=True,
remove_disfluencies=True,
end_of_turn_sensitivity=0.5,
custom_vocabulary=[
{"content": "Speechmatics", "sounds_like": ["speech matics", "speech mattics"]},
{"content": "Vapi", "sounds_like": ["vappy", "vahpee", "vaypee", "v a p i", "vap ee"]},
],
)- Speaker Diarization: Only transcriber on VAPI with speaker identification
- 55+ Languages: Broadest language coverage
- Sub-second Latency: Real-time transcription
- Enhanced Accuracy: Industry-leading accuracy
"Invalid API key"
- Verify
VAPI_API_KEYin.envis your private key - Ensure Speechmatics key is added in VAPI Dashboard under Provider Keys
"Transcription not working"
- Check Speechmatics key has sufficient credits
- Verify language code is valid
Help us improve this guide:
- Found an issue? Report it
- Have suggestions? Open a discussion
Difficulty: Beginner Languages: Python