Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Migrating from Deepgram to Speechmatics

MigrationFeature ComparisonCode Examples

Switching from Deepgram? This guide shows you equivalent features and code patterns to help you migrate smoothly.

Note

Migration Incentive: Get $200 free credit with code SWITCH200 when switching from Deepgram! Learn more


Table of Contents


Feature Mapping

Core Configuration

Feature Deepgram Speechmatics Notes
Model Selection model="nova-3" operating_point="enhanced" Enhanced for best accuracy, "standard" for faster turnaround
Language language="en-US" language="en" Speechmatics uses ISO 639-1 codes; no locale variants needed (handles all accents automatically). Mandarin uses cmn with output_locale for Simplified/Traditional formatting
Sample Rate sample_rate=16000 sample_rate=16000 Same parameter in AudioFormat
Encoding encoding="linear16" encoding="pcm_s16le" Slightly different naming
Channels channels=1 Via diarization="channel" + AsyncMultiChannelClient Speechmatics uses separate streams per channel
API Key DEEPGRAM_API_KEY SPEECHMATICS_API_KEY Environment variable naming

Real-time Streaming & Voice Features • Click to explore RealTime and Voice Features

Speechmatics Packages: speechmatics-rt for basic real-time streaming, speechmatics-voice for voice agent features (turn detection, segments, VAD events). Voice SDK is built on top of RT SDK.

Feature Deepgram Speechmatics Package Notes
Interim Results interim_results=True enable_partials=True rt, voice Partial transcripts while processing
Endpointing endpointing=500 (ms) max_delay=0.5 (seconds) rt, voice Duration engine waits to verify partial word accuracy before committing (0.7-4.0s)
Max Delay Mode Not available max_delay_mode="flexible" or "fixed" rt, voice Flexible allows entity completion
Utterance End utterance_end_ms=1000 end_of_utterance_silence_trigger=1.0 rt, voice Reference silence duration (0-2s); ADAPTIVE mode scales this based on speech patterns
Force End Utterance Finalize message client.finalize(end_of_turn=True) voice Manually trigger end of utterance
VAD Events vad_events=True (Beta) AgentServerMessageType.SPEAKER_STARTED
AgentServerMessageType.SPEAKER_ENDED
voice Voice activity detection events
Diarization diarize=True diarization="speaker" rt, voice Speaker labeling
Speaker Config Not available speaker_diarization_config=
SpeakerDiarizationConfig(...)
rt, voice Fine-tune diarization
Known Speakers Not available known_speakers=
[SpeakerIdentifier(label, speaker_identifiers)]
rt, voice Pre-register speaker voices
Speaker Focus Not available SpeakerFocusConfig(focus_speakers, ignore_speakers, focus_mode) voice Focus on specific speakers; only focused speakers drive conversation flow
Multichannel multichannel=True diarization="channel" or "channel_and_speaker" rt, voice Channel-based diarization
Channel Labels Not available channel_diarization_labels=["agent", "customer"] rt, voice Label audio channels
Keywords/Keyterms keywords=["term"],
keyterm=["term"]
additional_vocab=[{"content": "term"}] rt, voice Boost specific terms
Translation Not available translation_config=
TranslationConfig(target_languages=["es"]
rt Real-time translation
Audio Events Not available audio_events_config=AudioEventsConfig(types=[...]) rt Detect laughter, applause, etc.
Domain Not available domain="medical" rt, voice Domain-optimized language pack

Turn Detection (Voice SDK):

Feature Deepgram Speechmatics Notes
Fixed Delay Via settings EndOfUtteranceMode.FIXED Waits exactly the configured silence duration every time
Adaptive Delay Not available EndOfUtteranceMode.ADAPTIVE Scales wait time based on speech pace, filler words (um/uh), and punctuation
Smart Turn (ML) Not available smart_turn_config=SmartTurnConfig(enabled=True) Uses ML model to predict semantic turn completions (with ADAPTIVE mode)
External Control Not available EndOfUtteranceMode.EXTERNAL + client.finalize(end_of_turn=True) Application controls turn endings (for Pipecat/LiveKit integration)
Silence Trigger Via settings end_of_utterance_silence_trigger Reference duration (0-2s); ADAPTIVE mode applies multipliers based on context
Presets Not available preset="fast", "fixed", "adaptive", "smart_turn", "scribe", "captions", "external" Ready-to-use configurations optimized for specific use cases

Server Message Types:

Deepgram Event Speechmatics Event Package Notes
EventType.MESSAGE (is_final=True) ServerMessageType.ADD_TRANSCRIPT rt Final transcript
EventType.MESSAGE (is_final=False) ServerMessageType.ADD_PARTIAL_TRANSCRIPT rt Partial results
EventType.MESSAGE (UtteranceEnd) ServerMessageType.END_OF_UTTERANCE rt End of utterance
EventType.MESSAGE (SpeechStarted) AgentServerMessageType.SPEAKER_STARTED voice Speech detected
EventType.MESSAGE (Metadata) ServerMessageType.RECOGNITION_STARTED rt, voice Session metadata
Not available AgentServerMessageType.SPEAKER_ENDED voice Speech ended
Not available AgentServerMessageType.ADD_SEGMENT voice Final segment
Not available AgentServerMessageType.ADD_PARTIAL_SEGMENT voice Partial segment
Not available AgentServerMessageType.START_OF_TURN voice Turn started
Not available AgentServerMessageType.END_OF_TURN voice Turn completed
Not available AgentServerMessageType.END_OF_TURN_PREDICTION voice Turn prediction timing
Not available ServerMessageType.ADD_TRANSLATION rt Translation result
Not available ServerMessageType.AUDIO_EVENT_STARTED / ENDED rt Audio events
Not available ServerMessageType.SPEAKERS_RESULT rt Speaker identification

Usage - Basic RT Streaming:

from speechmatics.rt import AsyncClient, ServerMessageType, TranscriptionConfig, AudioFormat, AudioEncoding

async with AsyncClient(api_key="YOUR_KEY") as client:
    @client.on(ServerMessageType.ADD_TRANSCRIPT)
    def on_transcript(message):
        print(message['metadata']['transcript'])

    await client.transcribe(
        audio_file,
        transcription_config=TranscriptionConfig(language="en", diarization="speaker"),
        audio_format=AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000)
    )

Usage - Voice SDK (Turn Detection):

from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, EndOfUtteranceMode, AgentServerMessageType

config = VoiceAgentConfig(
    language="en",
    enable_diarization=True,
    end_of_utterance_mode=EndOfUtteranceMode.ADAPTIVE,
    end_of_utterance_silence_trigger=0.5
)

async with VoiceAgentClient(api_key="YOUR_KEY", config=config) as client:
    @client.on(AgentServerMessageType.ADD_SEGMENT)
    def on_segment(message):
        for segment in message['segments']:
            print(f"[{segment['speaker_id']}]: {segment['text']}")

    @client.on(AgentServerMessageType.END_OF_TURN)
    def on_turn_end(message):
        print("User finished speaking - ready for response")

    await client.send_audio(audio_chunk)

Batch Transcription Features • Click to explore Batch Features

Speechmatics Package: speechmatics-batch

Feature Deepgram Speechmatics Package Notes
Diarization diarize=True, diarize_version="latest" diarization="speaker" batch Speaker identification
Multichannel multichannel=True diarization="channel" or "channel_and_speaker" batch Channel-based diarization
Sentiment sentiment=True sentiment_analysis_config=SentimentAnalysisConfig() batch Sentiment analysis
Topic Detection topics=True topic_detection_config=TopicDetectionConfig(topics=[...]) batch Automatic topic extraction
Summarization summarize=True summarization_config=
SummarizationConfig(content_type, summary_length, summary_type)
batch AI-powered summaries
Intent Recognition intents=True Not available - Detect user intents
Entity Detection detect_entities=True enable_entities=True batch Detect named entities
Utterances utterances=True, utt_split=0.8 Not available - Split into utterances
Paragraphs paragraphs=True Not available - Paragraph segmentation
Dictation dictation=True Not available - Dictation mode formatting
Measurements measurements=True enable_entities=True batch Format measurements (e.g., "10 km/s")
Auto Chapters Not available auto_chapters_config=AutoChaptersConfig() batch Automatic chapter generation
Audio Events Not available audio_events_config=AudioEventsConfig(types=[...]) batch Detect laughter, applause, etc.
Translation Not available translation_config=TranslationConfig(target_languages=["es", "fr"]) batch Translate transcript
Language ID detect_language=True language_identification_config=
LanguageIdentificationConfig(expected_languages=[...])
batch Identify spoken language
Domain Not available domain="medical" batch Domain-optimized language pack
Output Locale Not available output_locale="en-US" batch RFC-5646 locale for output
Output Format ?format=srt get_transcript(job_id, format_type=FormatType.SRT) batch JSON, TXT, SRT formats
Webhooks callback="url" notification_config=
[NotificationConfig(url, contents, method)]
batch Job completion notifications
Job Tracking extra=KEY:VALUE tracking=TrackingConfig(title, reference, tags) batch Custom job metadata
Fetch from URL url=... fetch_data=FetchData(url, auth_headers) batch Transcribe from URL

Usage:

from speechmatics.batch import AsyncClient, JobConfig, JobType, TranscriptionConfig, SummarizationConfig

async with AsyncClient(api_key="YOUR_KEY") as client:
    config = JobConfig(
        type=JobType.TRANSCRIPTION,
        transcription_config=TranscriptionConfig(
            language="en",
            diarization="speaker",
            enable_entities=True
        ),
        summarization_config=SummarizationConfig(
            content_type="conversational",
            summary_length="brief"
        )
    )

    result = await client.transcribe("audio.wav", config=config)
    print(result.transcript_text)
    print(result.summary)

Output Formatting & Filtering • Click to explore Formatting Options

Speechmatics Packages: speechmatics-batch, speechmatics-rt - formatting features available in both batch and real-time.

Note: Parameters like punctuation_overrides, transcript_filtering_config, and audio_filtering_config accept dict objects. The SDK passes these directly to the API - refer to API documentation for valid keys.

Feature Deepgram Speechmatics Package Notes
Smart Formatting smart_format=True enable_entities=True batch, rt Dates, numbers, currencies, emails, etc.
Punctuation punctuate=True Enabled by default batch, rt Automatic punctuation
Punctuation Sensitivity Not available punctuation_overrides={"sensitivity": 0.4} batch, rt Control punctuation frequency (0-1)
Punctuation Marks Not available punctuation_overrides={"permitted_marks": [".", ","]} batch, rt Limit allowed punctuation marks
Output Locale Not available output_locale="en-GB" batch, rt Regional spelling (en-GB, en-US, en-AU)
Profanity profanity_filter=True Auto-tagged for en, it, es batch, rt Deepgram removes, Speechmatics tags as $PROFANITY
Disfluencies filler_words=True (include) transcript_filtering_config=
{"remove_disfluencies": True}
batch, rt Deepgram includes by opt-in; Speechmatics auto-tags, optionally removes (EN only)
Word Replacement replace=["old:new"] transcript_filtering_config={"replacements": [{"from": "old", "to": "new"}]} batch, rt Find/replace with regex support
Redaction redact=["pci", "ssn", "numbers"] transcript_filtering_config={"replacements": [...]} batch, rt Use replacements to redact sensitive data
Audio Filtering Not available audio_filtering_config={"volume_threshold": 3.4} batch, rt Remove background speech by volume (0-100)
Custom Vocab keywords=["term"], keyterm=["term"] additional_vocab=[{"content": "term", "sounds_like": [...]}] batch, rt Phonetic hints available

Usage (Batch):

from speechmatics.batch import AsyncClient, TranscriptionConfig

config = TranscriptionConfig(
    language="en",
    enable_entities=True,
    output_locale="en-GB",
    punctuation_overrides={"sensitivity": 0.4},
    transcript_filtering_config={"remove_disfluencies": True},
    additional_vocab=[
        {"content": "acetaminophen", "sounds_like": ["ah see tah min oh fen"]},
        {"content": "myocardial infarction", "sounds_like": ["my oh car dee al in fark shun"]}
    ]
)

async with AsyncClient(api_key="YOUR_KEY") as client:
    result = await client.transcribe("audio.wav", transcription_config=config)
    print(result.transcript_text)

Usage (Real-time):

from speechmatics.rt import AsyncClient, TranscriptionConfig, AudioFormat, AudioEncoding

config = TranscriptionConfig(
    language="en",
    enable_entities=True,
    punctuation_overrides={"sensitivity": 0.4},
    transcript_filtering_config={"remove_disfluencies": True}
)

async with AsyncClient(api_key="YOUR_KEY") as client:
    await client.transcribe(
        audio_file,
        transcription_config=config,
        audio_format=AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000)
    )

Text-to-Speech (TTS) • Click to explore TTS Features

Speechmatics Package: speechmatics-tts

Feature Deepgram Speechmatics Package Notes
API Style REST + WebSocket REST tts Both support audio output
Voices (EN) Multiple Voices 4 curated voices (sarah, theo, megan, jack) tts Different voice selection approaches
Output Formats Multiple encodings wav_16000, pcm_16000 tts Standard formats supported
Sample Rate Configurable 16kHz (optimized for speech) tts Speech-optimized defaults
Bit Rate Configurable Optimized defaults tts Quality settings
Streaming TTS WebSocket HTTP chunked streaming tts Both support streaming audio output
Callback callback="url" Not available - Webhook support
Model Opt-out mip_opt_out=True Options available post-preview tts Privacy controls
Request Tags tag=["label"] Via API headers tts Request identification

Usage:

# Deepgram TTS
from deepgram import DeepgramClient
client = DeepgramClient(api_key="YOUR_KEY")
with client.speak.v1.audio.generate(
    text="Hello world",
    model="aura-asteria-en",
    encoding="linear16",
    sample_rate=16000
) as response:
    audio_data = response.data

# Speechmatics TTS
from speechmatics.tts import AsyncClient, Voice, OutputFormat
async with AsyncClient(api_key="YOUR_KEY") as client:
    response = await client.generate(
        text="Hello world",
        voice=Voice.SARAH,
        output_format=OutputFormat.WAV_16000
    )
    audio_data = await response.read()


Why Switch?

Superior Accuracy

Metric Speechmatics Deepgram
Word Error Rate (WER) 6.8% 16.5%
Medical Keyword Recall 96% -
Noisy Environments Excellent Standard
Accent Recognition Market-leading Standard
Multi-speaker Accuracy Market-leading Standard

More Languages

Capability Speechmatics Deepgram
Languages Supported 55+ 30+
Accuracy Consistency Industry-leading across all Varies by language
Bilingual Packs Mandarin, Tamil, Malay, Tagalog + English 10 European languages only
Real-time Translation 30+ languages
Auto Language Detection

Advanced Features

Feature Speechmatics Deepgram
Domain-Specific Models Medical, finance, and more Limited
Custom Dictionary Size 1,000 words included 100 words
Speaker Diarization Included Extra charge
Speaker Identification Known speaker pre-registration
Speaker Focus Focus/ignore specific speakers

Flexible Deployment Options

Deployment Speechmatics Deepgram
SaaS/Cloud
On-Premises Limited
On-Device
Air-Gapped

Enterprise-Grade Security

  • ISO 27001 certified
  • GDPR compliant
  • HIPAA compliant

Industries & Use Cases

Speechmatics excels in:

  • Healthcare - 96% medical keyword recall with medical domain model
  • Contact Centers - Speaker ID, focus, and multi-speaker accuracy
  • Media & Captioning - High accuracy in noisy environments
  • Finance - Enterprise security with air-gapped deployment
  • Education - 55+ languages with consistent accuracy

Code Migration Examples

Batch Transcription

Deepgram:

from deepgram import DeepgramClient, PrerecordedOptions

client = DeepgramClient(api_key="YOUR_API_KEY")

with open("audio.wav", "rb") as audio_file:
    response = client.listen.prerecorded.transcribe_file(
        audio_file,
        PrerecordedOptions(
            model="nova-3",
            smart_format=True,
            diarize=True
        )
    )

transcript = response.results.channels[0].alternatives[0].transcript

Speechmatics:

import asyncio
from speechmatics.batch import AsyncClient, TranscriptionConfig

async def transcribe():
    async with AsyncClient(api_key="YOUR_API_KEY") as client:
        config = TranscriptionConfig(
            language="en",
            operating_point="enhanced",
            diarization="speaker",
            enable_entities=True
        )

        with open("audio.wav", "rb") as audio_file:
            result = await client.transcribe(audio_file, transcription_config=config)
            transcript = result.transcript_text

asyncio.run(transcribe())

What Changed:

  • Configuration is now in TranscriptionConfig object
  • Simpler result access with result.transcript_text
  • Async-first for better performance and resource management

Real-time Streaming

Deepgram:

from deepgram import DeepgramClient, LiveOptions
from deepgram.core.events import EventType

client = DeepgramClient(api_key="YOUR_API_KEY")
connection = client.listen.live.v("1")

def on_message(self, result, **kwargs):
    # Check if this is a final transcript result
    if hasattr(result, 'is_final') and result.is_final:
        sentence = result.channel.alternatives[0].transcript
        if len(sentence) > 0:
            print(sentence)

connection.on(EventType.MESSAGE, on_message)
connection.start(LiveOptions(model="nova-3", language="en-US", diarize=True))
connection.send(audio_chunk)
connection.finish()

Speechmatics:

from speechmatics.rt import AsyncClient, ServerMessageType, TranscriptResult, AudioFormat, AudioEncoding, TranscriptionConfig

async def stream_audio():
    async with AsyncClient(api_key="YOUR_API_KEY") as client:

        @client.on(ServerMessageType.ADD_TRANSCRIPT)
        def on_transcript(message):
            result = TranscriptResult.from_message(message)
            print(result.metadata.transcript)

        @client.on(ServerMessageType.ADD_PARTIAL_TRANSCRIPT)
        def on_partial(message):
            result = TranscriptResult.from_message(message)
            print(f"Partial: {result.metadata.transcript}")

        with open("audio.wav", "rb") as audio_file:
            await client.transcribe(
                audio_file,
                transcription_config=TranscriptionConfig(
                    language="en",
                    operating_point="enhanced",
                    diarization="speaker",
                    enable_partials=True
                ),
                audio_format=AudioFormat(
                    encoding=AudioEncoding.PCM_S16LE,
                    sample_rate=16000
                )
            )

asyncio.run(stream_audio())

What Changed:

  • Event-driven architecture with decorators
  • Structured message types via ServerMessageType enum
  • Better type safety with TranscriptResult objects
  • Separate events for final and partial transcripts

Speaker Diarization

Deepgram:

options = PrerecordedOptions(
    model="nova-3",
    diarize=True,
    utterances=True
)

response = client.listen.prerecorded.transcribe_file(audio_file, options)

for word in response.results.channels[0].alternatives[0].words:
    print(f"Speaker {word.speaker}: {word.word}")

Speechmatics:

config = TranscriptionConfig(
    language="en",
    diarization="speaker",
    # max_speakers is optional - see note below
)

result = await client.transcribe(audio_file, transcription_config=config)

for item in result.results:
    if item.type == "word":
        print(f"Speaker {item.attaches_to}: {item.alternatives[0].content}")

Advantages:

  • Higher accuracy in multi-speaker scenarios
  • Automatic speaker count detection
  • Fine-grained diarization controls via speaker_diarization_config

Note

max_speakers**: When set, the system consolidates all detected speakers into the specified number of groups. For example, max_speakers=2 with 4 actual speakers will merge them into just 2 speaker labels. Only use this when you're certain about the exact speaker count (e.g., a two-person interview). For most scenarios, omit this setting for automatic detection.


Speaker Focus (Voice SDK Only)

Speaker Focus allows you to designate primary speakers whose speech drives the conversation flow. This is useful for voice assistants where you want to focus on the user and ignore background speakers or the assistant's own voice.

Deepgram: Not available

Speechmatics (Voice SDK):

from speechmatics.voice import VoiceAgentClient, VoiceAgentConfig, SpeakerFocusConfig, SpeakerFocusMode

config = VoiceAgentConfig(
    language="en",
    enable_diarization=True,
    speaker_config=SpeakerFocusConfig(
        focus_speakers=["S1"],           # Primary speaker(s) to focus on
        ignore_speakers=["__ASSISTANT__"],  # Speakers to completely exclude
        focus_mode=SpeakerFocusMode.RETAIN  # or IGNORE
    )
)

async with VoiceAgentClient(api_key="YOUR_KEY", config=config) as client:
    # Only S1 can drive conversation flow
    # Other speakers' words only appear alongside focused speaker's speech
    ...

Focus Mode Options:

Mode Behavior
RETAIN Non-focused speakers' words are still emitted, but marked as passive. They only appear when a focused speaker is also speaking.
IGNORE Non-focused speakers are completely excluded from output.

Key Behavior: Only focused speakers can "drive" the conversation - their speech triggers VAD events, turn detection, and segment finalization. Non-focused speakers' words are processed but only emitted alongside active focused speaker content.


Custom Vocabulary

Deepgram:

options = PrerecordedOptions(
    model="nova-3",
    keywords=["Speechmatics", "DeepSeek", "TechTerm:2"]  # keyword:boost
)

Speechmatics:

config = TranscriptionConfig(
    language="en",
    additional_vocab=[
        {"content": "Speechmatics", "sounds_like": ["speech matics"]},
        {"content": "DeepSeek"},
        {"content": "TechTerm", "sounds_like": ["tek term", "tech term"]},
    ]
)

Features:

  • Phonetic alternatives with sounds_like for pronunciation variants
  • 1,000 words included (vs Deepgram's 100)
  • Better recognition of domain-specific terms

Content Filtering

Deepgram:

options = PrerecordedOptions(
    model="nova-3",
    profanity_filter=True,  # Removes profanities
    filler_words=True,       # Removes filler words
    replace=["SSN:REDACTED", "password:REDACTED"]
)

Speechmatics:

# Profanity tagging is automatic for en, it, es
config = {
    "language": "en",
    "transcript_filtering_config": {
        "remove_disfluencies": True,  # Remove "um", "uh", etc.
        "replacements": [
            {"from": "SSN", "to": "REDACTED"},
            {"from": "password", "to": "REDACTED"}
        ]
    }
}

Key Differences:

  • Profanity: Deepgram removes, Speechmatics auto-tags (appears as $PROFANITY)
  • Disfluencies: Both support removal of filler words
  • Redaction: Both support word replacement

Response Structure

Deepgram Response

{
  "metadata": {...},
  "results": {
    "channels": [{
      "alternatives": [{
        "transcript": "Full transcript text",
        "confidence": 0.98,
        "words": [
          {
            "word": "hello",
            "start": 0.0,
            "end": 0.5,
            "confidence": 0.99,
            "speaker": 0
          }
        ]
      }]
    }]
  }
}

Speechmatics Response

{
  "transcript_text": "Full transcript text",
  "results": [
    {
      "type": "word",
      "start_time": 0.0,
      "end_time": 0.5,
      "alternatives": [
        {
          "content": "hello",
          "confidence": 0.99
        }
      ],
      "attaches_to": "speaker_1"
    }
  ],
  "metadata": {...}
}

Key Differences:

  • Speechmatics provides transcript_text at the top level for quick access
  • Results are flat arrays instead of nested channels
  • Speaker is referenced via attaches_to field

Features Unique to Each Platform

Deepgram Only

  • Text-to-text search/keyword boosting

Speechmatics Only

  • Phonetic hints (sounds_like in additional_vocab)
  • Real-time translation (TranslationConfig)
  • Turn detection for voice agents (Voice SDK) with FIXED, ADAPTIVE, and EXTERNAL modes, plus Smart Turn ML
  • Comprehensive audio intelligence (sentiment + topics + summary together)
  • More granular speaker diarization controls (SpeakerDiarizationConfig)
  • Known speaker pre-registration (speaker_diarization_config.speakers)
  • Speaker Focus configuration - designate primary speakers, ignore others (e.g., assistant voice)
  • Voice SDK for conversational AI
  • Auto-disfluency tagging (automatic for English)
  • On-device and air-gapped deployment

Migration Checklist

Pre-Migration

  • Review feature mapping table above
  • Identify features you're currently using in Deepgram
  • Check language support for your use case
  • Sign up at portal.speechmatics.com
  • Get API key from portal
  • Apply code SWITCH200 for $200 free credit

Code Migration

  • Install SDK: pip install speechmatics-batch speechmatics-rt
  • Replace DEEPGRAM_API_KEY with SPEECHMATICS_API_KEY
  • Update imports from deepgram to speechmatics.batch or speechmatics.rt
  • Convert PrerecordedOptions/LiveOptions to TranscriptionConfig
  • Update event handlers (replace EventType with ServerMessageType)
  • Adjust result parsing (use result.transcript_text)

Testing

  • Test with same audio files used in Deepgram
  • Verify accuracy meets or exceeds previous results
  • Test error handling and retry logic
  • Performance testing for streaming use cases

Deployment

  • Update production environment variables
  • Deploy to staging environment
  • Monitor transcription quality
  • Verify usage metrics in portal

Common Gotchas

1. Async/Await Pattern

Speechmatics SDK is async-first:

import asyncio

async def main():
    async with AsyncClient(api_key="YOUR_API_KEY") as client:
        result = await client.transcribe(audio_file, transcription_config=config)
        print(result.transcript_text)

asyncio.run(main())

2. Response Structure

# Deepgram
text = response.results.channels[0].alternatives[0].transcript

# Speechmatics - simpler
text = result.transcript_text

3. Event Types (Streaming)

# Deepgram - uses generic MESSAGE event, check is_final for final vs partial
connection.on(EventType.MESSAGE, on_message)

# Speechmatics - separate events for final and partial
@client.on(ServerMessageType.ADD_TRANSCRIPT)
def on_transcript(message):
    ...

4. Audio Format

# Deepgram - in options
options = LiveOptions(encoding="linear16", sample_rate=16000)

# Speechmatics - separate object
audio_format = AudioFormat(encoding=AudioEncoding.PCM_S16LE, sample_rate=16000)

5. Language Codes - No Locales Required

# Deepgram - requires locale variants
options = PrerecordedOptions(language="en-US")  # or "en-GB", "en-AU"

# Speechmatics - just the language code, handles all accents automatically
config = TranscriptionConfig(language="en")  # Works for US, UK, AU, etc.

# Mandarin uses output_locale for character formatting
config = TranscriptionConfig(
    language="cmn",
    output_locale="cmn-Hans"  # Simplified Chinese (or "cmn-Hant" for Traditional)
)

Speechmatics' models are trained on diverse accents and don't require locale specification. Use output_locale for region-specific formatting (e.g., "en-GB" vs "en-US" spelling, or "cmn-Hans" vs "cmn-Hant" for Mandarin characters).


Complete Before/After Example

Before (Deepgram)

from deepgram import DeepgramClient, PrerecordedOptions
import os

def transcribe_audio():
    client = DeepgramClient(api_key=os.getenv("DEEPGRAM_API_KEY"))

    with open("audio.wav", "rb") as audio_file:
        response = client.listen.prerecorded.transcribe_file(
            audio_file,
            PrerecordedOptions(
                model="nova-3",
                smart_format=True,
                diarize=True,
                language="en-US",
                keywords=["ProductName", "TechTerm"]
            )
        )

    return response.results.channels[0].alternatives[0].transcript

print(transcribe_audio())

After (Speechmatics)

import asyncio
import os
from speechmatics.batch import AsyncClient, TranscriptionConfig

async def transcribe_audio():
    async with AsyncClient(api_key=os.getenv("SPEECHMATICS_API_KEY")) as client:
        config = TranscriptionConfig(
            language="en",
            operating_point="enhanced",
            diarization="speaker",
            enable_entities=True,
            additional_vocab=[
                {"content": "ProductName"},
                {"content": "TechTerm"}
            ]
        )

        with open("audio.wav", "rb") as audio_file:
            result = await client.transcribe(audio_file, transcription_config=config)
            return result.transcript_text

print(asyncio.run(transcribe_audio()))

See complete working examples in:


Need Help?

Migration Support

Related Academy Examples

Official Documentation


Feedback

Help us improve this guide:


Time to Migrate: 30-60 minutes Difficulty: Intermediate Languages: Python

Back to Academy Home