Skip to content

Latest commit

 

History

History
587 lines (475 loc) · 21.5 KB

File metadata and controls

587 lines (475 loc) · 21.5 KB

OmniRoute MCP Server

Model Context Protocol server that exposes OmniRoute's gateway intelligence as 16 tools for AI agents.

The MCP Server allows any AI agent (Claude Desktop, Cursor, VS Code Copilot, custom agents) to monitor, control, and optimize the OmniRoute AI gateway programmatically.


Architecture

┌──────────────────────────────────────────────────────────────────┐
│                         AI Agent / IDE                           │
│          (Claude Desktop, Cursor, VS Code, Custom)               │
└──────────────────────┬───────────────────────────────────────────┘
                       │  MCP Protocol (stdio or HTTP)
                       ▼
┌──────────────────────────────────────────────────────────────────┐
│                      OmniRoute MCP Server                        │
│  ┌──────────────┐  ┌─────────────────┐  ┌────────────────────┐  │
│  │ Scope        │  │  16 MCP Tools   │  │   Audit Logger     │  │
│  │ Enforcement  │──│  (Phase 1 + 2)  │──│   (SHA-256/SQLite) │  │
│  └──────────────┘  └────────┬────────┘  └────────────────────┘  │
└─────────────────────────────┼────────────────────────────────────┘
                              │  HTTP (internal)
                              ▼
┌──────────────────────────────────────────────────────────────────┐
│                    OmniRoute Gateway (port 20128)                 │
│        /v1/chat/completions  /api/combos  /api/usage  ...        │
└──────────────────────────────────────────────────────────────────┘

Quick Start

1. Environment Variables

# Required: OmniRoute base URL
export OMNIROUTE_BASE_URL="http://localhost:20128"

# Optional: API key for authenticated access
export OMNIROUTE_API_KEY="your-api-key"

# Optional: Scope enforcement (default: disabled)
export OMNIROUTE_MCP_ENFORCE_SCOPES="true"
export OMNIROUTE_MCP_SCOPES="read:health,read:combos,read:quota,read:usage,read:models,execute:completions,write:combos,write:budget,write:resilience"

2. stdio Transport (IDE Integration)

Add to your MCP client configuration:

Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "omniroute": {
      "command": "node",
      "args": ["path/to/9router/open-sse/mcp-server/server.ts"],
      "env": {
        "OMNIROUTE_BASE_URL": "http://localhost:20128",
        "OMNIROUTE_API_KEY": "your-key"
      }
    }
  }
}

Cursor (.cursor/mcp.json):

{
  "mcpServers": {
    "omniroute": {
      "command": "npx",
      "args": ["tsx", "open-sse/mcp-server/server.ts"],
      "env": {
        "OMNIROUTE_BASE_URL": "http://localhost:20128"
      }
    }
  }
}

VS Code (.vscode/settings.json):

{
  "mcp": {
    "servers": {
      "omniroute": {
        "command": "npx",
        "args": ["tsx", "open-sse/mcp-server/server.ts"],
        "env": {
          "OMNIROUTE_BASE_URL": "http://localhost:20128"
        }
      }
    }
  }
}

3. Start via CLI

# Direct start (stdio)
npx tsx open-sse/mcp-server/server.ts

# Or via OmniRoute CLI
omniroute --mcp

Tool Reference

Phase 1: Essential Tools (8)

# Tool Scopes Description
1 omniroute_get_health read:health Gateway health, uptime, memory, circuit breakers, rate limits, cache stats
2 omniroute_list_combos read:combos List all combos (model chains) with strategies and optional metrics
3 omniroute_get_combo_metrics read:combos Performance metrics for a specific combo
4 omniroute_switch_combo write:combos Activate or deactivate a combo for routing
5 omniroute_check_quota read:quota Remaining API quota per provider with token health status
6 omniroute_route_request execute:completions Send a chat completion through intelligent routing
7 omniroute_cost_report read:usage Cost report by period (session/day/week/month) with per-provider breakdown
8 omniroute_list_models_catalog read:models List all available models across providers with capabilities and pricing

Phase 2: Advanced Tools (8)

# Tool Scopes Description
9 omniroute_simulate_route read:health, read:combos Dry-run routing simulation showing fallback tree and estimated costs
10 omniroute_set_budget_guard write:budget Set session budget with action on exceed: degrade, block, or alert
11 omniroute_set_resilience_profile write:resilience Apply resilience profile: aggressive, balanced, or conservative
12 omniroute_test_combo execute:completions, read:combos Test each provider in a combo with a real prompt and a real upstream call, report latency/cost
13 omniroute_get_provider_metrics read:health Per-provider metrics with latency percentiles (p50/p95/p99), circuit breaker
14 omniroute_best_combo_for_task read:combos, read:health AI-powered combo recommendation by task type with budget/latency constraints
15 omniroute_explain_route read:health, read:usage Explain why a request was routed to a provider (scoring factors, fallbacks)
16 omniroute_get_session_snapshot read:usage Full session snapshot: cost, tokens, top models, errors, budget status

Client Examples

Python — Full Agent Workflow

"""
OmniRoute MCP Client — Python example using the mcp SDK.
Install: pip install mcp
"""
import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def main():
    server = StdioServerParameters(
        command="npx",
        args=["tsx", "open-sse/mcp-server/server.ts"],
        env={
            "OMNIROUTE_BASE_URL": "http://localhost:20128",
            "OMNIROUTE_API_KEY": "your-key",
        },
    )

    async with stdio_client(server) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            # 1. Check gateway health
            health = await session.call_tool("omniroute_get_health", {})
            print("Health:", health.content[0].text)

            # 2. List available combos with metrics
            combos = await session.call_tool("omniroute_list_combos", {
                "includeMetrics": True
            })
            print("Combos:", combos.content[0].text)

            # 3. Find the best combo for a coding task
            best = await session.call_tool("omniroute_best_combo_for_task", {
                "taskType": "coding",
                "budgetConstraint": 0.50,
                "latencyConstraint": 5000,
            })
            print("Best combo:", best.content[0].text)

            # 4. Set a session budget guard
            budget = await session.call_tool("omniroute_set_budget_guard", {
                "maxCost": 1.00,
                "action": "degrade",
                "degradeToTier": "cheap",
            })
            print("Budget guard:", budget.content[0].text)

            # 5. Route a request through intelligent pipeline
            response = await session.call_tool("omniroute_route_request", {
                "model": "claude-sonnet-4",
                "messages": [
                    {"role": "user", "content": "Write a Python hello world"}
                ],
                "role": "coding",
            })
            print("Response:", response.content[0].text)

            # 6. Get the session snapshot
            snapshot = await session.call_tool("omniroute_get_session_snapshot", {})
            print("Session:", snapshot.content[0].text)

asyncio.run(main())

TypeScript — Programmatic Agent

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";

async function main() {
  const transport = new StdioClientTransport({
    command: "npx",
    args: ["tsx", "open-sse/mcp-server/server.ts"],
    env: {
      OMNIROUTE_BASE_URL: "http://localhost:20128",
      OMNIROUTE_API_KEY: "your-key",
    },
  });

  const client = new Client({ name: "my-agent", version: "1.0.0" });
  await client.connect(transport);

  // Check quota before deciding which model to use
  const quota = await client.callTool({
    name: "omniroute_check_quota",
    arguments: { provider: "claude" },
  });
  console.log("Claude quota:", quota.content);

  // Simulate the route before actually calling
  const simulation = await client.callTool({
    name: "omniroute_simulate_route",
    arguments: {
      model: "claude-sonnet-4",
      promptTokenEstimate: 2000,
    },
  });
  console.log("Route simulation:", simulation.content);

  // Send the actual request
  const result = await client.callTool({
    name: "omniroute_route_request",
    arguments: {
      model: "claude-sonnet-4",
      messages: [{ role: "user", content: "Explain async/await" }],
    },
  });
  console.log("Result:", result.content);

  // Cost report
  const costs = await client.callTool({
    name: "omniroute_cost_report",
    arguments: { period: "session" },
  });
  console.log("Costs:", costs.content);

  await client.close();
}

main();

Go — HTTP Client

package main

import (
    "bytes"
    "encoding/json"
    "fmt"
    "io"
    "net/http"
)

// Simplified direct-API approach (bypass MCP, hit OmniRoute APIs directly)
// Useful if you don't need MCP protocol framing.

func callTool(baseURL, tool string, args map[string]any) (string, error) {
    // MCP tools map to OmniRoute APIs:
    endpoints := map[string]string{
        "health": "/api/monitoring/health",
        "combos": "/api/combos",
        "quota":  "/api/usage/quota",
        "models": "/v1/models",
    }

    url := baseURL + endpoints[tool]
    resp, err := http.Get(url)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    body, _ := io.ReadAll(resp.Body)
    return string(body), nil
}

func routeRequest(baseURL, model, prompt string) (string, error) {
    payload := map[string]any{
        "model": model,
        "messages": []map[string]string{
            {"role": "user", "content": prompt},
        },
        "stream": false,
    }
    data, _ := json.Marshal(payload)

    resp, err := http.Post(
        baseURL+"/v1/chat/completions",
        "application/json",
        bytes.NewReader(data),
    )
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    body, _ := io.ReadAll(resp.Body)
    return string(body), nil
}

func main() {
    base := "http://localhost:20128"

    health, _ := callTool(base, "health", nil)
    fmt.Println("Health:", health)

    result, _ := routeRequest(base, "auto", "Hello from Go!")
    fmt.Println("Result:", result)
}

Use Cases

🔄 Use Case 1: Auto-Healing Agent

An agent that monitors OmniRoute health and auto-switches combos when providers degrade.

async def auto_healing_loop(session):
    """Monitor health and react to provider issues."""
    while True:
        # Check health
        health = await session.call_tool("omniroute_get_health", {})
        data = json.loads(health.content[0].text)

        # Find providers with open circuit breakers
        broken = [
            cb for cb in data["circuitBreakers"]
            if cb["state"] == "OPEN"
        ]

        if broken:
            # Switch to a different resilience profile
            await session.call_tool("omniroute_set_resilience_profile", {
                "profile": "conservative"
            })

            # Find best alternative combo
            best = await session.call_tool("omniroute_best_combo_for_task", {
                "taskType": "coding"
            })
            best_data = json.loads(best.content[0].text)
            combo_id = best_data["recommendedCombo"]["id"]

            # Activate it
            await session.call_tool("omniroute_switch_combo", {
                "comboId": combo_id, "active": True
            })
            print(f"⚠️ Auto-healed: switched to {combo_id}")

        await asyncio.sleep(30)  # Check every 30 seconds

💰 Use Case 2: Budget-Aware Coding Agent

An agent that monitors costs in real-time and degrades to cheaper models when nearing budget.

async def budget_aware_coding(session, task: str, max_budget: float):
    """Complete a coding task within a budget."""
    # Set budget guard
    await session.call_tool("omniroute_set_budget_guard", {
        "maxCost": max_budget,
        "action": "degrade",
        "degradeToTier": "cheap",
    })

    # Simulate first to estimate cost
    sim = await session.call_tool("omniroute_simulate_route", {
        "model": "claude-sonnet-4",
        "promptTokenEstimate": len(task.split()) * 2,
    })
    sim_data = json.loads(sim.content[0].text)
    estimated_cost = sim_data["fallbackTree"]["bestCaseCost"]
    print(f"Estimated cost: ${estimated_cost:.4f}")

    # Send request
    result = await session.call_tool("omniroute_route_request", {
        "model": "claude-sonnet-4",
        "messages": [{"role": "user", "content": task}],
        "role": "coding",
    })

    # Check remaining budget
    snapshot = await session.call_tool("omniroute_get_session_snapshot", {})
    snap_data = json.loads(snapshot.content[0].text)
    print(f"Session cost: ${snap_data['costTotal']:.4f}")
    if snap_data.get("budgetGuard"):
        print(f"Budget remaining: ${snap_data['budgetGuard']['remaining']:.4f}")

    return json.loads(result.content[0].text)["response"]["content"]

🧪 Use Case 3: Combo Benchmarking Agent

An agent that periodically benchmarks all combos and reports the fastest/cheapest.

async def benchmark_combos(session):
    """Benchmark all enabled combos and rank them."""
    combos = await session.call_tool("omniroute_list_combos", {
        "includeMetrics": True,
    })
    combo_list = json.loads(combos.content[0].text)["combos"]

    results = []
    for combo in combo_list:
        if not combo["enabled"]:
            continue

        test = await session.call_tool("omniroute_test_combo", {
            "comboId": combo["id"],
            "testPrompt": "Return the number 42.",
        })
        test_data = json.loads(test.content[0].text)
        results.append({
            "combo": combo["name"],
            "fastest": test_data["summary"]["fastestProvider"],
            "cheapest": test_data["summary"]["cheapestProvider"],
            "success_rate": f'{test_data["summary"]["successful"]}/{test_data["summary"]["totalProviders"]}',
        })

    print("📊 Combo Benchmark Results:")
    for r in results:
        print(f"  {r['combo']}: fastest={r['fastest']}, cheapest={r['cheapest']}, success={r['success_rate']}")

🔍 Use Case 4: Post-Mortem Debugging Agent

An agent that explains why a request was routed to a specific provider.

async function debugRouting(client: Client, requestId: string) {
  // Explain the routing decision
  const explanation = await client.callTool({
    name: "omniroute_explain_route",
    arguments: { requestId },
  });
  const data = JSON.parse(explanation.content[0].text);

  console.log(`Request ${requestId}:`);
  console.log(`  Provider: ${data.decision.providerSelected}`);
  console.log(`  Model: ${data.decision.modelUsed}`);
  console.log(`  Score: ${data.decision.score}`);
  console.log(`  Factors:`);
  for (const factor of data.decision.factors) {
    console.log(`    ${factor.name}: ${factor.value} (weight: ${factor.weight})`);
  }
  if (data.decision.fallbacksTriggered.length > 0) {
    console.log(`  Fallbacks triggered:`);
    for (const fb of data.decision.fallbacksTriggered) {
      console.log(`    ${fb.provider}: ${fb.reason}`);
    }
  }
}

📋 Use Case 5: Model Discovery Agent

An agent that discovers the cheapest models for a given capability.

async def find_cheapest_models(session, capability="chat"):
    """Find the cheapest available models for a capability."""
    catalog = await session.call_tool("omniroute_list_models_catalog", {
        "capability": capability,
    })
    models = json.loads(catalog.content[0].text)["models"]

    # Filter available models with pricing
    priced = [
        m for m in models
        if m["status"] == "available" and m.get("pricing")
    ]
    priced.sort(key=lambda m: m["pricing"]["inputPerMillion"] or float("inf"))

    print(f"💡 Cheapest {capability} models:")
    for m in priced[:5]:
        input_cost = m["pricing"]["inputPerMillion"] or 0
        output_cost = m["pricing"]["outputPerMillion"] or 0
        print(f"  {m['id']} ({m['provider']}): ${input_cost}/M in, ${output_cost}/M out")

Security & Scope Enforcement

The MCP server supports fine-grained scope enforcement for multi-tenant environments:

Scope Tools
read:health get_health, simulate_route, get_provider_metrics, best_combo_for_task, explain_route
read:combos list_combos, get_combo_metrics, simulate_route, best_combo_for_task, test_combo
read:quota check_quota
read:usage cost_report, explain_route, get_session_snapshot
read:models list_models_catalog
write:combos switch_combo
write:budget set_budget_guard
write:resilience set_resilience_profile
execute:completions route_request, test_combo

Wildcard scopes: Use read:* to grant all read scopes, or * for full access.


Audit Logging

Every tool call is logged to the mcp_tool_audit SQLite table:

  • Input: SHA-256 hashed (never stores raw prompts)
  • Output: Truncated to 200 chars
  • Metadata: Tool name, duration, success/error, API key ID

Access audit data via:

import { getRecentAuditEntries, getAuditStats } from "./audit";

const entries = await getRecentAuditEntries(50);
const stats = await getAuditStats();
// stats: { totalCalls, successRate, avgDurationMs, topTools }

File Structure

mcp-server/
├── server.ts              # MCP server setup, essential tool handlers, entry point
├── index.ts               # Barrel export
├── audit.ts               # SQLite audit logger (SHA-256 input hashing)
├── scopeEnforcement.ts    # Fine-grained scope enforcement
├── schemas/
│   ├── tools.ts           # Zod schemas for all 16 tools (input/output/scopes)
│   ├── a2a.ts             # A2A protocol types (Agent Card, Task, JSON-RPC)
│   ├── audit.ts           # Audit & routing decision types + hash helpers
│   └── index.ts           # Schema barrel export
├── tools/
│   └── advancedTools.ts   # Phase 2 tool handlers (8 advanced tools)
└── __tests__/
    ├── essentialTools.test.ts
    ├── advancedTools.test.ts
    └── a2aLifecycle.test.ts

License

Part of OmniRoute — MIT License.