Learn the difference between batch and real-time transcription modes and when to use each.
This example shows both modes side-by-side so you understand which to use for your use case.
- The difference between batch and real-time transcription
- When to use batch mode vs real-time mode
- How to implement both approaches
- Performance and cost tradeoffs
- Speechmatics API Key: Get one from portal.speechmatics.com
- Python 3.8+
- Microphone (for real-time example)
- PortAudio (for real-time example): Required on Mac/Linux — see Quick Start Step 1
Step 1: Install PortAudio (system dependency for real-time microphone)
Note
The real-time example uses PyAudio for microphone input, which requires the PortAudio system library. The batch example does not need this.
Windows: No extra steps needed (included with PyAudio wheel).
Mac:
brew install portaudioLinux (Debian/Ubuntu):
sudo apt-get install portaudio19-devStep 2: Create and activate a virtual environment
On Windows:
cd python
python -m venv .venv
.venv\Scripts\activateOn Mac/Linux:
cd python
python3 -m venv .venv
source .venv/bin/activateStep 3: Install dependencies and run
pip install -r requirements.txt
cp ../.env.example .env
# Edit .env and add your SPEECHMATICS_API_KEY
# Try batch mode
python batch_example.py
# Try real-time mode
python realtime_example.pyBest for:
- Pre-recorded audio/video files
- Processing large volumes of files
- When you can wait for results (minutes)
- Highest accuracy (post-processing applied)
- Cost-effective for large files
How it works:
- Upload entire audio file
- Server processes the complete file
- Wait for processing (async)
- Download complete transcript
Example use cases:
- Podcast transcription
- Meeting recordings
- Video subtitle generation
- Call center recordings
Best for:
- Live audio streams
- When you need immediate results (< 1 second)
- Interactive applications
- Microphone input
- Live captioning
How it works:
- Open WebSocket connection
- Stream audio chunks continuously
- Receive partial and final transcripts in real-time
- Close connection when done
Example use cases:
- Live captioning
- Voice assistants
- Phone calls
- Live events
Note
Stream duration limit: Real-time streams are limited to 48 hours maximum per connection. For continuous 24/7 applications, implement reconnection logic to start a new stream before reaching this limit.
import asyncio
import os
import time
from pathlib import Path
from dotenv import load_dotenv
from speechmatics.batch import AsyncClient, TranscriptionConfig, OperatingPoint, AuthenticationError
# Load environment variables
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
audio_file = Path(__file__).parent.parent / "assets" / "sample.wav"
# Display file information
file_size_bytes = audio_file.stat().st_size
file_size_mb = file_size_bytes / (1024 * 1024)
print(f"Processing file: {audio_file.name}")
print(f"File size: {file_size_mb:.1f} MB")
print()
print("[... processing ...]")
print()
try:
# Track processing time
start_time = time.time()
# Initialize batch client
async with AsyncClient(api_key=api_key) as client:
# Configure transcription
config = TranscriptionConfig(
language="en",
operating_point=OperatingPoint.ENHANCED,
)
# Transcribe with batch API
result = await client.transcribe(
str(audio_file),
transcription_config=config,
)
# Calculate actual processing time
end_time = time.time()
processing_time = end_time - start_time
minutes = int(processing_time // 60)
seconds = int(processing_time % 60)
print(f"Complete! Processing time: {minutes}m {seconds}s")
print()
# Extract and display transcript
transcript = result.transcript_text
print("Full transcript:")
print(f'"{transcript}"')
except AuthenticationError as e:
print(f"\nAuthentication Error: {e}")
print("Please check your API key is valid at portal.speechmatics.com")
if __name__ == "__main__":
asyncio.run(main())import asyncio
import os
from dotenv import load_dotenv
from speechmatics.rt import (
AsyncClient,
ServerMessageType,
TranscriptionConfig,
TranscriptResult,
OperatingPoint,
AudioFormat,
AudioEncoding,
Microphone,
AuthenticationError,
)
# Load environment variables
load_dotenv()
async def main():
api_key = os.getenv("SPEECHMATICS_API_KEY")
# Store transcript parts for final output
transcript_parts = []
# Configure audio format for microphone input
audio_format = AudioFormat(
encoding=AudioEncoding.PCM_S16LE,
chunk_size=4096,
sample_rate=16000,
)
# Configure transcription with partials enabled
transcription_config = TranscriptionConfig(
language="en",
enable_partials=True,
operating_point=OperatingPoint.ENHANCED,
)
# Initialize microphone
mic = Microphone(
sample_rate=audio_format.sample_rate,
chunk_size=audio_format.chunk_size,
)
# Start microphone capture
if not mic.start():
print("PyAudio not installed. Install: pip install pyaudio")
return
try:
# Initialize real-time client
async with AsyncClient(api_key=api_key) as client:
# Handle final transcripts
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def handle_final_transcript(message):
result = TranscriptResult.from_message(message)
transcript = result.metadata.transcript
if transcript:
print(f"[final]: {transcript}")
transcript_parts.append(transcript)
# Handle partial transcripts (interim results)
@client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
def handle_partial_transcript(message):
result = TranscriptResult.from_message(message)
transcript = result.metadata.transcript
if transcript:
print(f"[partial]: {transcript}")
try:
print("Connected! Start speaking (Ctrl+C to stop)...\n")
# Start transcription session
await client.start_session(
transcription_config=transcription_config,
audio_format=audio_format,
)
# Stream audio continuously
while True:
frame = await mic.read(audio_format.chunk_size)
await client.send_audio(frame)
except KeyboardInterrupt:
pass
finally:
# Clean up microphone
mic.stop()
print(f"\n\nFull transcript: {' '.join(transcript_parts)}")
except AuthenticationError as e:
print(f"\nAuthentication Error: {e}")
print("Please check your API key is valid at portal.speechmatics.com")
if __name__ == "__main__":
asyncio.run(main())| Feature | Batch | Real-time |
|---|---|---|
| Latency | Minutes | < 1 second |
| Accuracy | Highest | Very high |
| Cost | Lower | Higher |
| Use Case | Pre-recorded | Live streams |
| Max Duration | No limit | 48 hours per stream |
| Partial Results | No | Yes |
Important
Use Batch if:
- You have a complete audio file
- You can wait for results
- You want highest accuracy
- You're processing in bulk
Use Real-time if:
- You're streaming live audio
- You need immediate feedback
- You're building interactive apps
- You need partial results
Batch Mode:
- File-based transcription with AsyncClient
- Complete file processing with wait for results
- Enhanced accuracy with OperatingPoint configuration
- Processing time measurement
Real-time Mode:
- WebSocket streaming with live audio
- Partial and final transcript events
- Microphone integration with PyAudio
- Event-driven architecture with decorators
Comparison:
- Performance tradeoffs (latency vs accuracy)
- Use case decision matrix
- Side-by-side code examples
Processing file: sample.wav
File size: 0.7 MB
[... processing ...]
Complete! Processing time: 0m 5s
Full transcript:
"SPEAKER UU: Good morning, everyone. Let's begin today's meeting."
Connected! Start speaking (Ctrl+C to stop)...
[partial]: Good morning
[partial]: Good morning everyone
[partial]: Good morning. Everyone. Let's begin
[partial]: Good morning, everyone. Let's begin. Today's
[partial]: Good morning, everyone. Let's begin today's meeting
[partial]: Good morning, everyone. Let's begin today's meeting
[partial]: Good morning, everyone. Let's begin today's meeting
[partial]: Good morning, everyone. Let's begin today's meeting
[partial]: Good morning, everyone. Let's begin today's meeting
[partial]: Good morning, everyone. Let's begin today's meeting
[partial]: Good morning, everyone. Let's begin today's meeting
[final]: Good morning,
[partial]: everyone. Let's begin today's meeting.
[final]: everyone.
[partial]: Let's begin today's meeting.
[final]: Let's
[partial]: begin today's meeting.
[final]: begin
[partial]: today's meeting.
[final]: today's
[partial]: meeting.
[final]: meeting.
Full transcript: Good morning, everyone. Let's begin today's meeting.
- Configuration Guide - Learn all config options for both modes
- Audio Intelligence - Add sentiment and insights
- Turn Detection - Real-time turn detection for conversations
- Voice Agent Turn Detection - Advanced presets for voice agents
Real-time: "Failed building wheel for pyaudio" / "portaudio.h: No such file"
- Install the PortAudio system library first:
- Mac:
brew install portaudio - Linux (Debian/Ubuntu):
sudo apt-get install portaudio19-dev
- Mac:
- Then reinstall:
pip install pyaudio
Batch: "Processing timeout"
- Check file size (very large files take longer)
- Verify file format is supported
- Try polling for results instead of blocking
Real-time: "WebSocket connection failed"
- Verify your API key is valid
- Check network/firewall settings
- Ensure WebSocket connections are allowed
Real-time: "Audio chunks too fast/slow"
- Match audio streaming rate to real-time
- Use proper audio format (16kHz, mono, PCM)
Help us improve this guide:
- Found an issue? Report it
- Have suggestions? Open a discussion
Time to Complete: 10 minutes Difficulty: Beginner API Modes: Batch & Real-time