docs: add MLflow AI Gateway cookbook example

PattaraS · PattaraS · commit 0ab5ad321f0b · 2026-04-02T15:35:22.000+07:00
Adds a Jupyter notebook demonstrating how to use MLflow AI Gateway as an
LLM backend for AutoGen agents via OpenAIChatCompletionClient with a
custom base_url pointing to the gateway's OpenAI-compatible endpoint.
diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/index.md b/python/docs/src/user-guide/core-user-guide/cookbook/index.md
@@ -15,6 +15,7 @@ openai-assistant-agent
 langgraph-agent
 llamaindex-agent
 local-llms-ollama-litellm
+mlflow-gateway
 instrumenting
 topic-subscription-scenarios
 structured-output-agent
diff --git a/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb b/python/docs/src/user-guide/core-user-guide/cookbook/mlflow-gateway.ipynb
@@ -0,0 +1,215 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Using MLflow AI Gateway with AutoGen\n",
+    "\n",
+    "[MLflow AI Gateway](https://mlflow.org/docs/latest/llms/gateway/index.html) is a database-backed LLM proxy built into the MLflow tracking server (MLflow ≥ 3.0). It gives you a **single OpenAI-compatible endpoint** that can route to dozens of LLM providers — OpenAI, Anthropic, Gemini, Mistral, Bedrock, Ollama, and more.\n",
+    "\n",
+    "Key features:\n",
+    "- **Multi-provider routing** — switch models without changing agent code\n",
+    "- **Secrets management** — provider API keys stored encrypted on the server; your application sends no provider keys\n",
+    "- **Fallback & retry** — automatic failover to backup models\n",
+    "- **Budget tracking** — per-endpoint or per-user token budgets\n",
+    "- **Usage tracing** — every call logged as an MLflow trace automatically\n",
+    "\n",
+    "Because MLflow Gateway speaks the OpenAI API, you can use `OpenAIChatCompletionClient` with a custom `base_url` to point any AutoGen agent at it."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "1. **Start an MLflow server** with the gateway enabled:\n",
+    "   ```bash\n",
+    "   pip install mlflow\n",
+    "   mlflow server --host 127.0.0.1 --port 5000\n",
+    "   ```\n",
+    "\n",
+    "2. **Create a gateway endpoint** via the MLflow UI at [http://localhost:5000](http://localhost:5000):  \n",
+    "   Navigate to **AI Gateway → Create Endpoint**, give it a name (e.g. `my-chat-endpoint`), select a provider and model, and save your API key (stored encrypted on the server).\n",
+    "\n",
+    "   Or create one via the REST API:\n",
+    "   ```bash\n",
+    "   # Step 1: Store provider key as a secret\n",
+    "   curl -s -X POST http://localhost:5000/api/2.0/mlflow/gateway/secrets \\\n",
+    "     -H 'Content-Type: application/json' \\\n",
+    "     -d '{\"secret_name\": \"openai-key\", \"secret_value\": {\"api_key\": \"sk-...\"}, \"provider\": \"openai\"}'\n",
+    "\n",
+    "   # Step 2: Create the endpoint (use the secret_id returned above)\n",
+    "   curl -s -X POST http://localhost:5000/api/2.0/mlflow/gateway/endpoints/create \\\n",
+    "     -H 'Content-Type: application/json' \\\n",
+    "     -d '{\"name\": \"my-chat-endpoint\", \"model_configs\": [{\"provider\": \"openai\", \"model_name\": \"gpt-4o-mini\", \"secret_id\": \"<secret_id>\"}]}'\n",
+    "   ```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Installation"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "pip install -U 'autogen-agentchat' 'autogen-ext[openai]'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Connect to MLflow Gateway\n",
+    "\n",
+    "Use `OpenAIChatCompletionClient` with:\n",
+    "- `base_url` pointing to the MLflow Gateway OpenAI-compatible endpoint\n",
+    "- `model` set to your **gateway endpoint name**\n",
+    "- `api_key` set to any non-empty string (the gateway manages provider keys server-side)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_ext.models.openai import OpenAIChatCompletionClient\n",
+    "\n",
+    "MLFLOW_GATEWAY_URL = \"http://localhost:5000\"\n",
+    "ENDPOINT_NAME = \"my-chat-endpoint\"  # the endpoint name you created in MLflow\n",
+    "\n",
+    "model_client = OpenAIChatCompletionClient(\n",
+    "    model=ENDPOINT_NAME,\n",
+    "    base_url=f\"{MLFLOW_GATEWAY_URL}/gateway/openai/v1\",\n",
+    "    api_key=\"unused\",  # provider keys are stored on the MLflow server\n",
+    "    model_capabilities={\n",
+    "        \"json_output\": False,\n",
+    "        \"vision\": False,\n",
+    "        \"function_calling\": True,\n",
+    "    },\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Single-turn Chat Example\n",
+    "\n",
+    "Use the model client directly to verify the connection:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_core.models import UserMessage\n",
+    "\n",
+    "result = await model_client.create(\n",
+    "    messages=[UserMessage(content=\"What is MLflow AI Gateway?\", source=\"user\")]\n",
+    ")\n",
+    "print(result.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Multi-Agent Chat Example\n",
+    "\n",
+    "Here we create two agents — a user proxy and an assistant — and run a short conversation through MLflow Gateway."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_agentchat.agents import AssistantAgent\n",
+    "from autogen_agentchat.ui import Console\n",
+    "from autogen_agentchat.teams import RoundRobinGroupChat\n",
+    "from autogen_agentchat.conditions import MaxMessageTermination\n",
+    "\n",
+    "# Create the assistant using the MLflow Gateway client\n",
+    "assistant = AssistantAgent(\n",
+    "    name=\"assistant\",\n",
+    "    model_client=model_client,\n",
+    "    system_message=\"You are a helpful AI assistant. Keep answers concise.\",\n",
+    ")\n",
+    "\n",
+    "# Run a quick conversation\n",
+    "termination = MaxMessageTermination(max_messages=3)\n",
+    "team = RoundRobinGroupChat([assistant], termination_condition=termination)\n",
+    "\n",
+    "await Console(team.run_stream(task=\"Explain LLM gateways in two sentences.\"))\n",
+    "await model_client.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Streaming\n",
+    "\n",
+    "MLflow Gateway supports streaming. AutoGen uses streaming automatically when available."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autogen_core.models import UserMessage\n",
+    "\n",
+    "async for chunk in model_client.create_stream(\n",
+    "    messages=[UserMessage(content=\"Write a haiku about LLM gateways.\", source=\"user\")]\n",
+    "):\n",
+    "    if hasattr(chunk, 'content') and chunk.content:\n",
+    "        print(chunk.content, end=\"\", flush=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Gateway Features\n",
+    "\n",
+    "All of these are configured in the MLflow UI — no code changes needed in your AutoGen application:\n",
+    "\n",
+    "| Feature | Description |\n",
+    "|---------|-------------|\n",
+    "| **Fallback** | If the primary model fails or is rate-limited, the gateway retries with a backup model automatically |\n",
+    "| **Traffic splitting** | Route X% of requests to model A and Y% to model B for A/B testing |\n",
+    "| **Budget tracking** | Set token/cost limits per endpoint or per user |\n",
+    "| **Usage tracing** | Every call is logged as an MLflow trace — inputs, outputs, latency, token counts |\n",
+    "\n",
+    "Your `model=ENDPOINT_NAME` value stays the same regardless of which provider or model the gateway routes to behind the scenes."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}