Commit c1f57f6
perf(responses): batch guardrail checks during streaming (#5664)
# What does this PR do?
Fixes a performance cliff when output guardrails are enabled on
streaming responses. Previously, every streaming token triggered an O(n)
string join, a `list_shields()` lookup, and a Safety API
`run_moderation()` call. For a 1000-token response this meant ~1000
redundant API calls and quadratic string reconstruction.
This PR:
- Extracts `resolve_guardrail_model_ids()` to cache shield lookups once
per request
- Batches guardrail checks every 200 characters (configurable via
`GUARDRAIL_BATCH_CHARS` env var) instead of every token
- Adds a final guardrail check at stream end for remaining buffered
content
- Flushes reasoning-only deltas per chunk so they stream in real time
## Test Plan
1. Unit tests pass (`236 passed`):
```bash
uv run pytest tests/unit/providers/responses/ -x --tb=short -q
```
2. New test verifies reasoning events stream without waiting for text
accumulation:
```bash
uv run pytest tests/unit/providers/inline/responses/builtin/responses/test_streaming.py::test_guardrailed_reasoning_streams_before_completion -v
```
3. Benchmark script for A/B testing against a running OGX server:
```bash
# Start server with per-token checking (before):
GUARDRAIL_BATCH_CHARS=1 SAFETY_MODEL=ollama/llama-guard3:1b uv run ogx stack run starter
# Run benchmark:
uv run python scripts/benchmark_guardrail_batching.py --model openai/gpt-4.1-nano
# Restart server with batched checking (after, default):
SAFETY_MODEL=ollama/llama-guard3:1b uv run ogx stack run starter
# Run benchmark again and compare
uv run python scripts/benchmark_guardrail_batching.py --model openai/gpt-4.1-nano
```
---------
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 107a2bc commit c1f57f6
4 files changed
Lines changed: 198 additions & 29 deletions
File tree
- src/ogx/providers/inline/responses/builtin/responses
- tests
- integration/agents/recordings
- unit/providers/inline/responses/builtin/responses
Lines changed: 56 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
| |||
129 | 130 | | |
130 | 131 | | |
131 | 132 | | |
| 133 | + | |
| 134 | + | |
132 | 135 | | |
133 | 136 | | |
134 | 137 | | |
| |||
304 | 307 | | |
305 | 308 | | |
306 | 309 | | |
| 310 | + | |
307 | 311 | | |
308 | 312 | | |
309 | 313 | | |
| |||
411 | 415 | | |
412 | 416 | | |
413 | 417 | | |
| 418 | + | |
| 419 | + | |
414 | 420 | | |
415 | | - | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
416 | 427 | | |
417 | 428 | | |
418 | 429 | | |
| |||
1038 | 1049 | | |
1039 | 1050 | | |
1040 | 1051 | | |
| 1052 | + | |
| 1053 | + | |
1041 | 1054 | | |
1042 | 1055 | | |
1043 | 1056 | | |
| |||
1059 | 1072 | | |
1060 | 1073 | | |
1061 | 1074 | | |
1062 | | - | |
1063 | | - | |
1064 | | - | |
1065 | 1075 | | |
1066 | 1076 | | |
1067 | 1077 | | |
| |||
1115 | 1125 | | |
1116 | 1126 | | |
1117 | 1127 | | |
1118 | | - | |
| 1128 | + | |
1119 | 1129 | | |
1120 | 1130 | | |
1121 | 1131 | | |
1122 | 1132 | | |
1123 | | - | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
1124 | 1136 | | |
1125 | 1137 | | |
1126 | 1138 | | |
| |||
1137 | 1149 | | |
1138 | 1150 | | |
1139 | 1151 | | |
1140 | | - | |
| 1152 | + | |
1141 | 1153 | | |
1142 | 1154 | | |
1143 | 1155 | | |
| |||
1232 | 1244 | | |
1233 | 1245 | | |
1234 | 1246 | | |
1235 | | - | |
1236 | | - | |
1237 | | - | |
| 1247 | + | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
1238 | 1254 | | |
1239 | | - | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
1240 | 1261 | | |
1241 | 1262 | | |
1242 | | - | |
| 1263 | + | |
1243 | 1264 | | |
1244 | 1265 | | |
1245 | 1266 | | |
1246 | | - | |
1247 | | - | |
1248 | | - | |
1249 | | - | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
1250 | 1290 | | |
1251 | 1291 | | |
1252 | 1292 | | |
| |||
Lines changed: 24 additions & 13 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
548 | 548 | | |
549 | 549 | | |
550 | 550 | | |
551 | | - | |
552 | | - | |
553 | | - | |
554 | | - | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | | - | |
| 551 | + | |
| 552 | + | |
559 | 553 | | |
560 | | - | |
561 | | - | |
| 554 | + | |
| 555 | + | |
562 | 556 | | |
563 | 557 | | |
564 | | - | |
| 558 | + | |
565 | 559 | | |
566 | 560 | | |
567 | 561 | | |
568 | | - | |
569 | | - | |
| 562 | + | |
570 | 563 | | |
571 | 564 | | |
| 565 | + | |
| 566 | + | |
| 567 | + | |
| 568 | + | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
572 | 583 | | |
573 | 584 | | |
574 | 585 | | |
| |||
Lines changed: 62 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 56 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
7 | 8 | | |
8 | 9 | | |
9 | 10 | | |
| |||
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| 27 | + | |
| 28 | + | |
26 | 29 | | |
27 | 30 | | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| 34 | + | |
| 35 | + | |
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
| |||
577 | 582 | | |
578 | 583 | | |
579 | 584 | | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
0 commit comments