Skip to content

Commit 0ab5ad3

Browse files
committed
docs: add MLflow AI Gateway cookbook example
Adds a Jupyter notebook demonstrating how to use MLflow AI Gateway as an LLM backend for AutoGen agents via OpenAIChatCompletionClient with a custom base_url pointing to the gateway's OpenAI-compatible endpoint.
1 parent 8544314 commit 0ab5ad3

File tree

2 files changed

+216
-0
lines changed

2 files changed

+216
-0
lines changed

python/docs/src/user-guide/core-user-guide/cookbook/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ openai-assistant-agent
1515
langgraph-agent
1616
llamaindex-agent
1717
local-llms-ollama-litellm
18+
mlflow-gateway
1819
instrumenting
1920
topic-subscription-scenarios
2021
structured-output-agent
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# Using MLflow AI Gateway with AutoGen\n",
8+
"\n",
9+
"[MLflow AI Gateway](https://mlflow.org/docs/latest/llms/gateway/index.html) is a database-backed LLM proxy built into the MLflow tracking server (MLflow ≥ 3.0). It gives you a **single OpenAI-compatible endpoint** that can route to dozens of LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more.\n",
10+
"\n",
11+
"Key features:\n",
12+
"- **Multi-provider routing** — switch models without changing agent code\n",
13+
"- **Secrets management** — provider API keys stored encrypted on the server; your application sends no provider keys\n",
14+
"- **Fallback & retry** — automatic failover to backup models\n",
15+
"- **Budget tracking** — per-endpoint or per-user token budgets\n",
16+
"- **Usage tracing** — every call logged as an MLflow trace automatically\n",
17+
"\n",
18+
"Because MLflow Gateway speaks the OpenAI API, you can use `OpenAIChatCompletionClient` with a custom `base_url` to point any AutoGen agent at it."
19+
]
20+
},
21+
{
22+
"cell_type": "markdown",
23+
"metadata": {},
24+
"source": [
25+
"## Prerequisites\n",
26+
"\n",
27+
"1. **Start an MLflow server** with the gateway enabled:\n",
28+
" ```bash\n",
29+
" pip install mlflow\n",
30+
" mlflow server --host 127.0.0.1 --port 5000\n",
31+
" ```\n",
32+
"\n",
33+
"2. **Create a gateway endpoint** via the MLflow UI at [http://localhost:5000](http://localhost:5000): \n",
34+
" Navigate to **AI Gateway → Create Endpoint**, give it a name (e.g. `my-chat-endpoint`), select a provider and model, and save your API key (stored encrypted on the server).\n",
35+
"\n",
36+
" Or create one via the REST API:\n",
37+
" ```bash\n",
38+
" # Step 1: Store provider key as a secret\n",
39+
" curl -s -X POST http://localhost:5000/api/2.0/mlflow/gateway/secrets \\\n",
40+
" -H 'Content-Type: application/json' \\\n",
41+
" -d '{\"secret_name\": \"openai-key\", \"secret_value\": {\"api_key\": \"sk-...\"}, \"provider\": \"openai\"}'\n",
42+
"\n",
43+
" # Step 2: Create the endpoint (use the secret_id returned above)\n",
44+
" curl -s -X POST http://localhost:5000/api/2.0/mlflow/gateway/endpoints/create \\\n",
45+
" -H 'Content-Type: application/json' \\\n",
46+
" -d '{\"name\": \"my-chat-endpoint\", \"model_configs\": [{\"provider\": \"openai\", \"model_name\": \"gpt-4o-mini\", \"secret_id\": \"<secret_id>\"}]}'\n",
47+
" ```"
48+
]
49+
},
50+
{
51+
"cell_type": "markdown",
52+
"metadata": {},
53+
"source": [
54+
"## Installation"
55+
]
56+
},
57+
{
58+
"cell_type": "code",
59+
"execution_count": null,
60+
"metadata": {},
61+
"outputs": [],
62+
"source": [
63+
"pip install -U 'autogen-agentchat' 'autogen-ext[openai]'"
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"metadata": {},
69+
"source": [
70+
"## Connect to MLflow Gateway\n",
71+
"\n",
72+
"Use `OpenAIChatCompletionClient` with:\n",
73+
"- `base_url` pointing to the MLflow Gateway OpenAI-compatible endpoint\n",
74+
"- `model` set to your **gateway endpoint name**\n",
75+
"- `api_key` set to any non-empty string (the gateway manages provider keys server-side)"
76+
]
77+
},
78+
{
79+
"cell_type": "code",
80+
"execution_count": null,
81+
"metadata": {},
82+
"outputs": [],
83+
"source": [
84+
"from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
85+
"\n",
86+
"MLFLOW_GATEWAY_URL = \"http://localhost:5000\"\n",
87+
"ENDPOINT_NAME = \"my-chat-endpoint\" # the endpoint name you created in MLflow\n",
88+
"\n",
89+
"model_client = OpenAIChatCompletionClient(\n",
90+
" model=ENDPOINT_NAME,\n",
91+
" base_url=f\"{MLFLOW_GATEWAY_URL}/gateway/openai/v1\",\n",
92+
" api_key=\"unused\", # provider keys are stored on the MLflow server\n",
93+
" model_capabilities={\n",
94+
" \"json_output\": False,\n",
95+
" \"vision\": False,\n",
96+
" \"function_calling\": True,\n",
97+
" },\n",
98+
")"
99+
]
100+
},
101+
{
102+
"cell_type": "markdown",
103+
"metadata": {},
104+
"source": [
105+
"## Single-turn Chat Example\n",
106+
"\n",
107+
"Use the model client directly to verify the connection:"
108+
]
109+
},
110+
{
111+
"cell_type": "code",
112+
"execution_count": null,
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"from autogen_core.models import UserMessage\n",
117+
"\n",
118+
"result = await model_client.create(\n",
119+
" messages=[UserMessage(content=\"What is MLflow AI Gateway?\", source=\"user\")]\n",
120+
")\n",
121+
"print(result.content)"
122+
]
123+
},
124+
{
125+
"cell_type": "markdown",
126+
"metadata": {},
127+
"source": [
128+
"## Multi-Agent Chat Example\n",
129+
"\n",
130+
"Here we create two agents — a user proxy and an assistant — and run a short conversation through MLflow Gateway."
131+
]
132+
},
133+
{
134+
"cell_type": "code",
135+
"execution_count": null,
136+
"metadata": {},
137+
"outputs": [],
138+
"source": [
139+
"from autogen_agentchat.agents import AssistantAgent\n",
140+
"from autogen_agentchat.ui import Console\n",
141+
"from autogen_agentchat.teams import RoundRobinGroupChat\n",
142+
"from autogen_agentchat.conditions import MaxMessageTermination\n",
143+
"\n",
144+
"# Create the assistant using the MLflow Gateway client\n",
145+
"assistant = AssistantAgent(\n",
146+
" name=\"assistant\",\n",
147+
" model_client=model_client,\n",
148+
" system_message=\"You are a helpful AI assistant. Keep answers concise.\",\n",
149+
")\n",
150+
"\n",
151+
"# Run a quick conversation\n",
152+
"termination = MaxMessageTermination(max_messages=3)\n",
153+
"team = RoundRobinGroupChat([assistant], termination_condition=termination)\n",
154+
"\n",
155+
"await Console(team.run_stream(task=\"Explain LLM gateways in two sentences.\"))\n",
156+
"await model_client.close()"
157+
]
158+
},
159+
{
160+
"cell_type": "markdown",
161+
"metadata": {},
162+
"source": [
163+
"## Streaming\n",
164+
"\n",
165+
"MLflow Gateway supports streaming. AutoGen uses streaming automatically when available."
166+
]
167+
},
168+
{
169+
"cell_type": "code",
170+
"execution_count": null,
171+
"metadata": {},
172+
"outputs": [],
173+
"source": [
174+
"from autogen_core.models import UserMessage\n",
175+
"\n",
176+
"async for chunk in model_client.create_stream(\n",
177+
" messages=[UserMessage(content=\"Write a haiku about LLM gateways.\", source=\"user\")]\n",
178+
"):\n",
179+
" if hasattr(chunk, 'content') and chunk.content:\n",
180+
" print(chunk.content, end=\"\", flush=True)"
181+
]
182+
},
183+
{
184+
"cell_type": "markdown",
185+
"metadata": {},
186+
"source": [
187+
"## Gateway Features\n",
188+
"\n",
189+
"All of these are configured in the MLflow UI — no code changes needed in your AutoGen application:\n",
190+
"\n",
191+
"| Feature | Description |\n",
192+
"|---------|-------------|\n",
193+
"| **Fallback** | If the primary model fails or is rate-limited, the gateway retries with a backup model automatically |\n",
194+
"| **Traffic splitting** | Route X% of requests to model A and Y% to model B for A/B testing |\n",
195+
"| **Budget tracking** | Set token/cost limits per endpoint or per user |\n",
196+
"| **Usage tracing** | Every call is logged as an MLflow trace — inputs, outputs, latency, token counts |\n",
197+
"\n",
198+
"Your `model=ENDPOINT_NAME` value stays the same regardless of which provider or model the gateway routes to behind the scenes."
199+
]
200+
}
201+
],
202+
"metadata": {
203+
"kernelspec": {
204+
"display_name": "Python 3",
205+
"language": "python",
206+
"name": "python3"
207+
},
208+
"language_info": {
209+
"name": "python",
210+
"version": "3.11.0"
211+
}
212+
},
213+
"nbformat": 4,
214+
"nbformat_minor": 4
215+
}

0 commit comments

Comments
 (0)