Skip to content

Commit 0ee3e23

Browse files
authored
feat: add fallbackModel option for automatic model failover on provider errors (#1136)
* feat: add fallbackModel option for automatic model failover on provider errors When a primary model returns a retryable HTTP status (429, 500, 502, 503), the SDK now automatically re-issues the request to a user-specified fallback model. This unblocks users hitting transient "high demand" errors on models like Gemini 2.5 Flash Lite by transparently offloading to a backup model (e.g., Gemini 2.5 Pro). The fallback model gets a single attempt to prevent infinite loops and can be from a completely different provider. Adds fallbackModel/fallback_model to GuardOptions, RedactOptions, and ScanOptions in both TypeScript and Python SDKs with full test coverage. * docs: document fallbackModel option in TypeScript and Python SDK references Adds a "Model Fallback" section under Client Configuration explaining the feature with code examples, and adds the fallbackModel/fallback_model field to the guard, redact, and scan options tables in both SDK pages. * chore: bump TypeScript SDK version to 0.1.7-rc1 * bump sdk versions
1 parent 7017b92 commit 0ee3e23

File tree

12 files changed

+630
-16
lines changed

12 files changed

+630
-16
lines changed

docs/content/docs/sdk/sdk/python.mdx

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,30 @@ client = create_client(
7070

7171
The fallback URL can also be set via the `SUPERAGENT_FALLBACK_URL` environment variable.
7272

73+
### Model Fallback
74+
75+
When using third-party providers (e.g., Google Gemini), transient errors like 503 (high demand) or 429 (rate limited) can cause requests to fail. The SDK supports automatic model fallback: if the primary model returns a retryable error, the request is re-issued to a backup model you specify.
76+
77+
```python
78+
result = await client.guard(
79+
input="user message to analyze",
80+
model="google/gemini-2.5-flash-lite",
81+
fallback_model="google/gemini-2.5-pro"
82+
)
83+
```
84+
85+
If the primary model succeeds, `fallback_model` is never called. If it returns a retryable status code (429, 500, 502, or 503), the SDK automatically retries with the fallback model. The fallback model can be from a different provider entirely:
86+
87+
```python
88+
result = await client.guard(
89+
input="user message to analyze",
90+
model="google/gemini-2.5-flash-lite",
91+
fallback_model="openai/gpt-4o-mini"
92+
)
93+
```
94+
95+
The `fallback_model` option is available on `guard()`, `redact()`, and `scan()`. The fallback model gets a single attempt — there is no recursive fallback chain.
96+
7397
---
7498

7599
## Guard
@@ -92,6 +116,7 @@ if result.classification == "block":
92116
|--------|------|----------|---------|-------------|
93117
| `input` | `str \| bytes` | Yes | - | The input to analyze |
94118
| `model` | `str` | No | `superagent/guard-1.7b` | Model in `provider/model` format |
119+
| `fallback_model` | `str` | No | - | Backup model used when primary returns 429/500/502/503 |
95120
| `system_prompt` | `str` | No | - | Custom system prompt |
96121
| `chunk_size` | `int` | No | `8000` | Characters per chunk (0 to disable) |
97122

@@ -148,6 +173,7 @@ print(result.redacted)
148173
|--------|------|----------|---------|-------------|
149174
| `input` | `str` | Yes | - | The text to redact |
150175
| `model` | `str` | Yes | - | Model in `provider/model` format |
176+
| `fallback_model` | `str` | No | - | Backup model used when primary returns 429/500/502/503 |
151177
| `entities` | `list[str]` | No | Default PII | Entity types to redact |
152178
| `rewrite` | `bool` | No | `False` | Rewrite contextually instead of placeholders |
153179

@@ -216,6 +242,7 @@ print(f"Cost: ${response.usage.cost:.4f}")
216242
| `repo` | `str` | Yes | - | Git repository URL (https:// or git@) |
217243
| `branch` | `str` | No | Default branch | Branch, tag, or commit to checkout |
218244
| `model` | `str` | No | `anthropic/claude-sonnet-4-5` | Model for OpenCode analysis |
245+
| `fallback_model` | `str` | No | - | Backup model used when primary returns 429/500/502/503 |
219246

220247
### Response
221248

docs/content/docs/sdk/sdk/typescript.mdx

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,30 @@ const client = createClient({
7070

7171
The fallback URL can also be set via the `SUPERAGENT_FALLBACK_URL` environment variable.
7272

73+
### Model Fallback
74+
75+
When using third-party providers (e.g., Google Gemini), transient errors like 503 (high demand) or 429 (rate limited) can cause requests to fail. The SDK supports automatic model fallback: if the primary model returns a retryable error, the request is re-issued to a backup model you specify.
76+
77+
```typescript
78+
const result = await client.guard({
79+
input: "user message to analyze",
80+
model: "google/gemini-2.5-flash-lite",
81+
fallbackModel: "google/gemini-2.5-pro"
82+
});
83+
```
84+
85+
If the primary model succeeds, `fallbackModel` is never called. If it returns a retryable status code (429, 500, 502, or 503), the SDK automatically retries with the fallback model. The fallback model can be from a different provider entirely:
86+
87+
```typescript
88+
const result = await client.guard({
89+
input: "user message to analyze",
90+
model: "google/gemini-2.5-flash-lite",
91+
fallbackModel: "openai/gpt-4o-mini"
92+
});
93+
```
94+
95+
The `fallbackModel` option is available on `guard()`, `redact()`, and `scan()`. The fallback model gets a single attempt — there is no recursive fallback chain.
96+
7397
---
7498

7599
## Guard
@@ -95,6 +119,7 @@ if (result.classification === "block") {
95119
|--------|------|----------|---------|-------------|
96120
| `input` | `string \| Blob \| URL` | Yes | - | The input to analyze |
97121
| `model` | `string` | No | `superagent/guard-1.7b` | Model in `provider/model` format |
122+
| `fallbackModel` | `string` | No | - | Backup model used when primary returns 429/500/502/503 |
98123
| `systemPrompt` | `string` | No | - | Custom system prompt |
99124
| `chunkSize` | `number` | No | `8000` | Characters per chunk (0 to disable) |
100125

@@ -155,6 +180,7 @@ console.log(result.redacted);
155180
|--------|------|----------|---------|-------------|
156181
| `input` | `string` | Yes | - | The text to redact |
157182
| `model` | `string` | Yes | - | Model in `provider/model` format |
183+
| `fallbackModel` | `string` | No | - | Backup model used when primary returns 429/500/502/503 |
158184
| `entities` | `string[]` | No | Default PII | Entity types to redact |
159185
| `rewrite` | `boolean` | No | `false` | Rewrite contextually instead of placeholders |
160186

@@ -225,6 +251,7 @@ console.log(`Cost: $${response.usage.cost.toFixed(4)}`);
225251
| `repo` | `string` | Yes | - | Git repository URL (https:// or git@) |
226252
| `branch` | `string` | No | Default branch | Branch, tag, or commit to checkout |
227253
| `model` | `string` | No | `anthropic/claude-sonnet-4-5` | Model for OpenCode analysis |
254+
| `fallbackModel` | `string` | No | - | Backup model used when primary returns 429/500/502/503 |
228255

229256
### Response
230257

sdk/python/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "safety-agent"
3-
version = "0.1.5"
3+
version = "0.1.7-rc1"
44
description = "A lightweight Python guardrail SDK for content safety"
55
readme = "README.md"
66
license = "MIT"

sdk/python/src/safety_agent/client.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -241,6 +241,7 @@ async def _guard_single_text(
241241
input_text: str,
242242
system_prompt: str | None,
243243
model: str,
244+
fallback_model: str | None = None,
244245
) -> GuardResponse:
245246
"""Guard a single chunk of text input (internal method)."""
246247
is_superagent = model.startswith("superagent/")
@@ -265,7 +266,7 @@ async def _guard_single_text(
265266
GUARD_RESPONSE_FORMAT if _supports_structured_output(model) else None
266267
)
267268
response = await call_provider(
268-
model, messages, response_format, self._fallback_options
269+
model, messages, response_format, self._fallback_options, fallback_model
269270
)
270271
content = response.choices[0].message.content
271272

@@ -292,6 +293,7 @@ async def _guard_image(
292293
processed: ProcessedInput,
293294
system_prompt: str | None,
294295
model: str,
296+
fallback_model: str | None = None,
295297
) -> GuardResponse:
296298
"""Guard an image input using vision model (internal method)."""
297299
if not is_vision_model(model):
@@ -323,7 +325,7 @@ async def _guard_image(
323325
GUARD_RESPONSE_FORMAT if _supports_structured_output(model) else None
324326
)
325327
response = await call_provider(
326-
model, messages, response_format, self._fallback_options
328+
model, messages, response_format, self._fallback_options, fallback_model
327329
)
328330
content = response.choices[0].message.content
329331

@@ -350,6 +352,7 @@ async def guard(
350352
input: GuardInput | None = None,
351353
*,
352354
model: str | None = None,
355+
fallback_model: str | None = None,
353356
system_prompt: str | None = None,
354357
chunk_size: int = 8000,
355358
# Also accept GuardOptions-style kwargs
@@ -369,6 +372,7 @@ async def guard(
369372
Args:
370373
input: The input to analyze - text, URL, or bytes
371374
model: Model in "provider/model" format. Defaults to superagent/guard-1.7b
375+
fallback_model: Fallback model when the primary returns a retryable error (429/500/502/503)
372376
system_prompt: Optional custom system prompt
373377
chunk_size: Characters per chunk. Default: 8000. Set to 0 to disable chunking.
374378
@@ -380,6 +384,7 @@ async def guard(
380384
options = input
381385
input = options.input
382386
model = model or options.model
387+
fallback_model = fallback_model or options.fallback_model
383388
system_prompt = system_prompt or options.system_prompt
384389
chunk_size = options.chunk_size
385390

@@ -401,7 +406,7 @@ async def guard(
401406

402407
# Handle image inputs with vision models
403408
if processed.type == "image":
404-
result = await self._guard_image(processed, system_prompt, model)
409+
result = await self._guard_image(processed, system_prompt, model, fallback_model)
405410
self._post_usage(result.usage)
406411
return result
407412

@@ -424,7 +429,7 @@ async def guard(
424429
# Analyze each page in parallel
425430
results = await asyncio.gather(
426431
*[
427-
self._guard_single_text(page_text, system_prompt, model)
432+
self._guard_single_text(page_text, system_prompt, model, fallback_model)
428433
for page_text in non_empty_pages
429434
]
430435
)
@@ -439,15 +444,15 @@ async def guard(
439444

440445
# Skip chunking if disabled (chunk_size=0) or input is small enough
441446
if chunk_size == 0 or len(text) <= chunk_size:
442-
result = await self._guard_single_text(text, system_prompt, model)
447+
result = await self._guard_single_text(text, system_prompt, model, fallback_model)
443448
self._post_usage(result.usage)
444449
return result
445450

446451
# Chunk and process in parallel
447452
chunks = _chunk_text(text, chunk_size)
448453
results = await asyncio.gather(
449454
*[
450-
self._guard_single_text(chunk, system_prompt, model)
455+
self._guard_single_text(chunk, system_prompt, model, fallback_model)
451456
for chunk in chunks
452457
]
453458
)
@@ -462,6 +467,7 @@ async def redact(
462467
input: str | None = None,
463468
*,
464469
model: str | None = None,
470+
fallback_model: str | None = None,
465471
entities: list[str] | None = None,
466472
rewrite: bool = False,
467473
# Also accept RedactOptions-style kwargs
@@ -473,6 +479,7 @@ async def redact(
473479
Args:
474480
input: The input text to redact
475481
model: Model in "provider/model" format, e.g. "openai/gpt-4o"
482+
fallback_model: Fallback model when the primary returns a retryable error (429/500/502/503)
476483
entities: Optional list of entity types to redact (overrides default entities)
477484
rewrite: When true, rewrites text contextually instead of using placeholders
478485
@@ -484,6 +491,7 @@ async def redact(
484491
options = input
485492
input = options.input
486493
model = model or options.model
494+
fallback_model = fallback_model or options.fallback_model
487495
entities = entities or options.entities
488496
rewrite = options.rewrite
489497

@@ -509,7 +517,7 @@ async def redact(
509517
REDACT_RESPONSE_FORMAT if _supports_structured_output(model) else None
510518
)
511519
response = await call_provider(
512-
model, messages, response_format, self._fallback_options
520+
model, messages, response_format, self._fallback_options, fallback_model
513521
)
514522
content = response.choices[0].message.content
515523

sdk/python/src/safety_agent/providers/__init__.py

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,13 +99,21 @@ def get_provider(provider_name: str) -> Any:
9999
return provider
100100

101101

102+
RETRYABLE_STATUS_CODES = {429, 500, 502, 503}
103+
104+
102105
async def call_provider(
103106
model_string: str,
104107
messages: list[ChatMessage],
105108
response_format: ResponseFormat | None = None,
106109
fallback_options: FallbackOptions | None = None,
110+
fallback_model: str | None = None,
107111
) -> AnalysisResponse:
108-
"""Call an LLM provider with the given messages."""
112+
"""Call an LLM provider with the given messages.
113+
114+
If ``fallback_model`` is set and the primary model returns a retryable
115+
status code, the request is re-issued against the fallback model.
116+
"""
109117
parsed = parse_model(model_string)
110118
provider = get_provider(parsed.provider)
111119

@@ -159,6 +167,21 @@ async def call_provider(
159167
)
160168

161169
if response.status_code != 200:
170+
if (
171+
fallback_model
172+
and response.status_code in RETRYABLE_STATUS_CODES
173+
):
174+
print(
175+
f"Primary model {model_string} failed "
176+
f"({response.status_code}), falling back to "
177+
f"{fallback_model}"
178+
)
179+
return await call_provider(
180+
fallback_model,
181+
messages,
182+
response_format,
183+
fallback_options,
184+
)
162185
raise RuntimeError(
163186
f"Provider API error ({response.status_code}): {response.text}"
164187
)
@@ -200,6 +223,21 @@ async def call_provider(
200223
)
201224

202225
if response.status_code != 200:
226+
if (
227+
fallback_model
228+
and response.status_code in RETRYABLE_STATUS_CODES
229+
):
230+
print(
231+
f"Primary model {model_string} failed "
232+
f"({response.status_code}), falling back to "
233+
f"{fallback_model}"
234+
)
235+
return await call_provider(
236+
fallback_model,
237+
messages,
238+
response_format,
239+
fallback_options,
240+
)
203241
raise RuntimeError(
204242
f"Provider API error ({response.status_code}): {response.text}"
205243
)

sdk/python/src/safety_agent/types.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,9 @@ class GuardOptions:
126126
model: SupportedModel | None = None
127127
"""Model in 'provider/model' format. Defaults to superagent/guard-1.7b."""
128128

129+
fallback_model: SupportedModel | None = None
130+
"""Fallback model to use when the primary model returns a retryable error (429/500/502/503)."""
131+
129132
system_prompt: str | None = None
130133
"""Optional custom system prompt that replaces the default guard prompt."""
131134

@@ -165,6 +168,9 @@ class RedactOptions:
165168
model: SupportedModel
166169
"""Model in 'provider/model' format, e.g. 'openai/gpt-4o'."""
167170

171+
fallback_model: SupportedModel | None = None
172+
"""Fallback model to use when the primary model returns a retryable error (429/500/502/503)."""
173+
168174
entities: list[str] | None = None
169175
"""Optional list of entity types to redact (overrides default entities)."""
170176

@@ -289,6 +295,9 @@ class ScanOptions:
289295
model: str = "anthropic/claude-sonnet-4-5"
290296
"""Model for OpenCode to use (provider/model format)."""
291297

298+
fallback_model: str | None = None
299+
"""Fallback model to use when the primary model returns a retryable error (429/500/502/503)."""
300+
292301

293302
@dataclass
294303
class ScanUsage:

0 commit comments

Comments
 (0)