Skip to content

Commit ebf6395

Browse files
authored
[Feat] Add Eleven Labs - Speech To Text Support on LiteLLM (#12119)
* add ELEVENLABS as a provider * add deepgram to main.py * add ElevenLabsException * add ElevenLabsAudioTranscriptionConfig * add transform_audio_transcription_response * TestElevenLabsAudioTranscription * add elevenlabs/scribe_v1 to model cost map * add ElevenLabsAudioTranscriptionConfig * add AudioTranscriptionRequestData * add ElevenLabs transform * use AudioTranscriptionRequestData * refactoring fixes * add ProcessedAudioFile util for reading audio files * test_elevenlabs_diarize_parameter_passthrough * docs eleven labs * docs fixes * fix code qa checks * fixes - audio transcription * ui - add ElevenLabs logo * add elevenlabs logo * docs - ElevenLabs * test fix elevenlabs
1 parent 041db02 commit ebf6395

File tree

24 files changed

+1109
-131
lines changed

24 files changed

+1109
-131
lines changed
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
import Tabs from '@theme/Tabs';
2+
import TabItem from '@theme/TabItem';
3+
4+
# ElevenLabs
5+
6+
ElevenLabs provides high-quality AI voice technology, including speech-to-text capabilities through their transcription API.
7+
8+
| Property | Details |
9+
|----------|---------|
10+
| Description | ElevenLabs offers advanced AI voice technology with speech-to-text transcription capabilities that support multiple languages and speaker diarization. |
11+
| Provider Route on LiteLLM | `elevenlabs/` |
12+
| Provider Doc | [ElevenLabs API ↗](https://elevenlabs.io/docs/api-reference) |
13+
| Supported Endpoints | `/audio/transcriptions` |
14+
15+
## Quick Start
16+
17+
### LiteLLM Python SDK
18+
19+
<Tabs>
20+
<TabItem value="basic" label="Basic Usage">
21+
22+
```python showLineNumbers title="Basic audio transcription with ElevenLabs"
23+
import litellm
24+
25+
# Transcribe audio file
26+
with open("audio.mp3", "rb") as audio_file:
27+
response = litellm.transcription(
28+
model="elevenlabs/scribe_v1",
29+
file=audio_file,
30+
api_key="your-elevenlabs-api-key" # or set ELEVENLABS_API_KEY env var
31+
)
32+
33+
print(response.text)
34+
```
35+
36+
</TabItem>
37+
38+
<TabItem value="advanced" label="Advanced Features">
39+
40+
```python showLineNumbers title="Audio transcription with advanced features"
41+
import litellm
42+
43+
# Transcribe with speaker diarization and language specification
44+
with open("audio.wav", "rb") as audio_file:
45+
response = litellm.transcription(
46+
model="elevenlabs/scribe_v1",
47+
file=audio_file,
48+
language="en", # Language hint (maps to language_code)
49+
temperature=0.3, # Control randomness in transcription
50+
diarize=True, # Enable speaker diarization
51+
api_key="your-elevenlabs-api-key"
52+
)
53+
54+
print(f"Transcription: {response.text}")
55+
print(f"Language: {response.language}")
56+
57+
# Access word-level timestamps if available
58+
if hasattr(response, 'words') and response.words:
59+
for word_info in response.words:
60+
print(f"Word: {word_info['word']}, Start: {word_info['start']}, End: {word_info['end']}")
61+
```
62+
63+
</TabItem>
64+
65+
<TabItem value="async" label="Async Usage">
66+
67+
```python showLineNumbers title="Async audio transcription"
68+
import litellm
69+
import asyncio
70+
71+
async def transcribe_audio():
72+
with open("audio.mp3", "rb") as audio_file:
73+
response = await litellm.atranscription(
74+
model="elevenlabs/scribe_v1",
75+
file=audio_file,
76+
api_key="your-elevenlabs-api-key"
77+
)
78+
79+
return response.text
80+
81+
# Run async transcription
82+
result = asyncio.run(transcribe_audio())
83+
print(result)
84+
```
85+
86+
</TabItem>
87+
</Tabs>
88+
89+
### LiteLLM Proxy
90+
91+
#### 1. Configure your proxy
92+
93+
<Tabs>
94+
<TabItem value="config-yaml" label="config.yaml">
95+
96+
```yaml showLineNumbers title="ElevenLabs configuration in config.yaml"
97+
model_list:
98+
- model_name: elevenlabs-transcription
99+
litellm_params:
100+
model: elevenlabs/scribe_v1
101+
api_key: os.environ/ELEVENLABS_API_KEY
102+
103+
general_settings:
104+
master_key: your-master-key
105+
```
106+
107+
</TabItem>
108+
109+
<TabItem value="env-vars" label="Environment Variables">
110+
111+
```bash showLineNumbers title="Required environment variables"
112+
export ELEVENLABS_API_KEY="your-elevenlabs-api-key"
113+
export LITELLM_MASTER_KEY="your-master-key"
114+
```
115+
116+
</TabItem>
117+
</Tabs>
118+
119+
#### 2. Start the proxy
120+
121+
```bash showLineNumbers title="Start LiteLLM proxy server"
122+
litellm --config config.yaml
123+
124+
# Proxy will be available at http://localhost:4000
125+
```
126+
127+
#### 3. Make transcription requests
128+
129+
<Tabs>
130+
<TabItem value="curl" label="Curl">
131+
132+
```bash showLineNumbers title="Audio transcription with curl"
133+
curl http://localhost:4000/v1/audio/transcriptions \
134+
-H "Authorization: Bearer $LITELLM_API_KEY" \
135+
-H "Content-Type: multipart/form-data" \
136+
-F file="@audio.mp3" \
137+
-F model="elevenlabs-transcription" \
138+
-F language="en" \
139+
-F temperature="0.3"
140+
```
141+
142+
</TabItem>
143+
144+
<TabItem value="openai-sdk" label="OpenAI Python SDK">
145+
146+
```python showLineNumbers title="Using OpenAI SDK with LiteLLM proxy"
147+
from openai import OpenAI
148+
149+
# Initialize client with your LiteLLM proxy URL
150+
client = OpenAI(
151+
base_url="http://localhost:4000",
152+
api_key="your-litellm-api-key"
153+
)
154+
155+
# Transcribe audio file
156+
with open("audio.mp3", "rb") as audio_file:
157+
response = client.audio.transcriptions.create(
158+
model="elevenlabs-transcription",
159+
file=audio_file,
160+
language="en",
161+
temperature=0.3,
162+
# ElevenLabs-specific parameters
163+
diarize=True,
164+
speaker_boost=True,
165+
custom_vocabulary="technical,AI,machine learning"
166+
)
167+
168+
print(response.text)
169+
```
170+
171+
</TabItem>
172+
173+
<TabItem value="javascript" label="JavaScript/Node.js">
174+
175+
```javascript showLineNumbers title="Audio transcription with JavaScript"
176+
import OpenAI from 'openai';
177+
import fs from 'fs';
178+
179+
const openai = new OpenAI({
180+
baseURL: 'http://localhost:4000',
181+
apiKey: 'your-litellm-api-key'
182+
});
183+
184+
async function transcribeAudio() {
185+
const response = await openai.audio.transcriptions.create({
186+
file: fs.createReadStream('audio.mp3'),
187+
model: 'elevenlabs-transcription',
188+
language: 'en',
189+
temperature: 0.3,
190+
diarize: true,
191+
speaker_boost: true
192+
});
193+
194+
console.log(response.text);
195+
}
196+
197+
transcribeAudio();
198+
```
199+
200+
</TabItem>
201+
</Tabs>
202+
203+
## Response Format
204+
205+
ElevenLabs returns transcription responses in OpenAI-compatible format:
206+
207+
```json showLineNumbers title="Example transcription response"
208+
{
209+
"text": "Hello, this is a sample transcription with multiple speakers.",
210+
"task": "transcribe",
211+
"language": "en",
212+
"words": [
213+
{
214+
"word": "Hello",
215+
"start": 0.0,
216+
"end": 0.5
217+
},
218+
{
219+
"word": "this",
220+
"start": 0.5,
221+
"end": 0.8
222+
}
223+
]
224+
}
225+
```
226+
227+
### Common Issues
228+
229+
1. **Invalid API Key**: Ensure `ELEVENLABS_API_KEY` is set correctly
230+
231+

docs/my-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -415,6 +415,7 @@ const sidebars = {
415415
"providers/groq",
416416
"providers/github",
417417
"providers/deepseek",
418+
"providers/elevenlabs",
418419
"providers/fireworks_ai",
419420
"providers/clarifai",
420421
"providers/vllm",

litellm/__init__.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -478,6 +478,7 @@ def identify(event_details):
478478
nebius_models: List = []
479479
nebius_embedding_models: List = []
480480
deepgram_models: List = []
481+
elevenlabs_models: List = []
481482

482483

483484
def is_bedrock_pricing_only_model(key: str) -> bool:
@@ -651,6 +652,8 @@ def add_known_models():
651652
featherless_ai_models.append(key)
652653
elif value.get("litellm_provider") == "deepgram":
653654
deepgram_models.append(key)
655+
elif value.get("litellm_provider") == "elevenlabs":
656+
elevenlabs_models.append(key)
654657

655658

656659
add_known_models()
@@ -733,6 +736,7 @@ def add_known_models():
733736
+ featherless_ai_models
734737
+ nscale_models
735738
+ deepgram_models
739+
+ elevenlabs_models
736740
)
737741

738742
model_list_set = set(model_list)
@@ -797,6 +801,7 @@ def add_known_models():
797801
"nscale": nscale_models,
798802
"featherless_ai": featherless_ai_models,
799803
"deepgram": deepgram_models,
804+
"elevenlabs": elevenlabs_models,
800805
}
801806

802807
# mapping for those models which have larger equivalents

litellm/litellm_core_utils/audio_utils/utils.py

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,110 @@
33
"""
44

55
import os
6+
from dataclasses import dataclass
67

8+
from litellm.types.files import get_file_mime_type_from_extension
79
from litellm.types.utils import FileTypes
810

911

12+
@dataclass
13+
class ProcessedAudioFile:
14+
"""
15+
Processed audio file data.
16+
17+
Attributes:
18+
file_content: The binary content of the audio file
19+
filename: The filename (extracted or generated)
20+
content_type: The MIME type of the audio file
21+
"""
22+
file_content: bytes
23+
filename: str
24+
content_type: str
25+
26+
27+
def process_audio_file(audio_file: FileTypes) -> ProcessedAudioFile:
28+
"""
29+
Common utility function to process audio files for audio transcription APIs.
30+
31+
Handles various input types:
32+
- File paths (str, os.PathLike)
33+
- Raw bytes/bytearray
34+
- Tuples (filename, content, optional content_type)
35+
- File-like objects with read() method
36+
37+
Args:
38+
audio_file: The audio file input in various formats
39+
40+
Returns:
41+
ProcessedAudioFile: Structured data with file content, filename, and content type
42+
43+
Raises:
44+
ValueError: If audio_file type is unsupported or content cannot be extracted
45+
"""
46+
file_content = None
47+
filename = None
48+
49+
if isinstance(audio_file, (bytes, bytearray)):
50+
# Raw bytes
51+
filename = 'audio.wav'
52+
file_content = bytes(audio_file)
53+
elif isinstance(audio_file, (str, os.PathLike)):
54+
# File path or PathLike
55+
file_path = str(audio_file)
56+
with open(file_path, 'rb') as f:
57+
file_content = f.read()
58+
filename = file_path.split('/')[-1]
59+
elif isinstance(audio_file, tuple):
60+
# Tuple format: (filename, content, content_type) or (filename, content)
61+
if len(audio_file) >= 2:
62+
filename = audio_file[0] or 'audio.wav'
63+
content = audio_file[1]
64+
if isinstance(content, (bytes, bytearray)):
65+
file_content = bytes(content)
66+
elif isinstance(content, (str, os.PathLike)):
67+
# File path or PathLike
68+
with open(str(content), 'rb') as f:
69+
file_content = f.read()
70+
elif hasattr(content, 'read'):
71+
# File-like object
72+
file_content = content.read()
73+
if hasattr(content, 'seek'):
74+
content.seek(0)
75+
else:
76+
raise ValueError(f"Unsupported content type in tuple: {type(content)}")
77+
else:
78+
raise ValueError("Tuple must have at least 2 elements: (filename, content)")
79+
elif hasattr(audio_file, 'read') and not isinstance(audio_file, (str, bytes, bytearray, tuple, os.PathLike)):
80+
# File-like object (IO) - check this after all other types
81+
filename = getattr(audio_file, 'name', 'audio.wav')
82+
file_content = audio_file.read() # type: ignore
83+
# Reset file pointer if possible
84+
if hasattr(audio_file, 'seek'):
85+
audio_file.seek(0) # type: ignore
86+
else:
87+
raise ValueError(f"Unsupported audio_file type: {type(audio_file)}")
88+
89+
if file_content is None:
90+
raise ValueError("Could not extract file content from audio_file")
91+
92+
# Determine content type using LiteLLM's file type utilities
93+
content_type = 'audio/wav' # Default fallback
94+
if filename:
95+
try:
96+
# Extract extension from filename
97+
extension = filename.split('.')[-1].lower() if '.' in filename else 'wav'
98+
content_type = get_file_mime_type_from_extension(extension)
99+
except ValueError:
100+
# If extension is not recognized, fallback to audio/wav
101+
content_type = 'audio/wav'
102+
103+
return ProcessedAudioFile(
104+
file_content=file_content,
105+
filename=filename,
106+
content_type=content_type
107+
)
108+
109+
10110
def get_audio_file_name(file_obj: FileTypes) -> str:
11111
"""
12112
Safely get the name of a file-like object or return its string representation.

0 commit comments

Comments
 (0)