Releases: envoyproxy/ai-gateway
v0.4.0
Envoy AI Gateway v0.4.0 - November 07, 2025
Release introducing Model Context Protocol (MCP) Gateway, OpenAI Image Generation, Anthropic support (direct and AWS Bedrock), guided output decoding for GCP Vertex AI/Gemini, cross-namespace references, enhanced authentication, and comprehensive observability improvements.
🔗 View release notes on the site
✨ New Features
Model Context Protocol (MCP) Gateway
New MCPRoute CRD
Introduces
MCPRoutecustom resource for routing MCP requests to backend MCP servers, enabling unified AI API for multiple MCP backends.
Complete MCP spec implementation
Includes streamable HTTP transport, JSON-RPC 2.0 support, and MCP spec-compliant OAuth 2.0 authorization with JWKS validation and Protected Resource Metadata.
Server multiplexing and tool routing
Aggregates multiple MCP servers behind a single endpoint with intelligent tool routing, tool filtering (exact match and regex patterns), and collision detection.
Upstream authentication
Supports both OAuth-based authentication and API key authentication for secure backend MCP server communication with configurable headers.
Session management
Implements MCP session handling with encryption, rotatable seeds, and graceful session lifecycle management.
Anthropic Provider Support
Direct api.anthropic.com support
Native integration with Anthropic's API at
api.anthropic.com, complementing existing GCP Vertex AI Anthropic support.
AWS Bedrock native Anthropic Messages API
Support for Claude models on AWS Bedrock using the native Anthropic Messages API format instead of the generic Converse API, enabling full feature parity with direct Anthropic API including prompt caching and extended thinking.
Anthropic API key authentication
Native
x-api-keyheader-based authentication matching Anthropic's API conventions and SDK patterns for direct Anthropic connections.
Passthrough translator with token usage tracking
Efficient passthrough translation layer that captures token usage and maintains API compatibility while minimizing overhead for both direct and AWS Bedrock Anthropic endpoints.
Standalone CLI auto-configuration
Auto-configuration from
ANTHROPIC_API_KEYenvironment variable in standalone mode for zero-config deployments.
Guided Output Support for GCP Vertex AI/Gemini
Guided regex support
Constrains model outputs to match specific regular expressions for GCP Vertex AI/Gemini models, enabling structured text generation.
Guided choice support
Restricts model outputs to predefined choices for GCP Vertex AI/Gemini models, ensuring responses conform to expected values.
Guided JSON support
Ensures model outputs are valid JSON conforming to specified schemas for GCP Vertex AI/Gemini models, with OpenAI-compatible API translation.
Provider-Specific Enhancements
OpenAI Image Generation /v1/images/generations endpoint
End-to-end support for OpenAI's image generation API including request/response translation, Brotli encoding/decoding, and full protocol compatibility.
OpenAI legacy /v1/completions endpoint
Full pass-through support for OpenAI's legacy completions endpoint with complete tracing and metrics, ensuring backward compatibility.
Azure OpenAI embeddings support
Native support for Azure OpenAI embeddings API with proper protocol translation and token usage tracking.
AWS Bedrock reasoning tokens
Full support for reasoning/thinking tokens in AWS Bedrock responses for both streaming and non-streaming modes, properly exposing extended thinking processes in Claude models.
GCP Vertex AI safety settings
Support for GCP-specific safety settings configuration, allowing fine-grained control over content filtering and safety thresholds for Gemini models.
GCP Gemini streaming token accounting
Accurate completion_tokens reporting in streaming usage chunks for Gemini models, ensuring proper token accounting during streaming responses.
Cross-Namespace Resource References
Cross-namespace AIServiceBackend references
AIGatewayRoutecan now referenceAIServiceBackendresources in different namespaces, enabling multi-tenant and organizational separation patterns.
ReferenceGrant validation
Comprehensive ReferenceGrant integration following Gateway API patterns, with automatic validation and clear error messages when grants are missing.
Enhanced Upstream Authentication
AWS SDK default credential chain
Support for AWS SDK's default credential chain including IRSA (IAM Roles for Service Accounts), EKS Pod Identity, EC2 Instance Profiles, and environment variables, eliminating need for static credentials or OIDC settings
Azure API key authentication
Native Azure OpenAI API key authentication using the
api-keyheader, matching Azure SDK conventions and console practices.
Traffic Management and Configuration
Header mutations at route and backend levels
New
headerMutationfields in bothAIServiceBackendandAIGatewayRouteRuleBackendRefenable header manipulation with smart merge logic for advanced routing scenarios.
InferencePool v1 support
Updated to Gateway API Inference Extension v1.0, providing stable intelligent endpoint selection with enhanced performance and reliability.
Cached token usage tracking for actual token usage reporting
Captures and reports cached token statistics from cloud providers (Anthropic, Bedrock, etc.), providing accurate cost attribution for prompt caching features.
Standalone Mode and CLI
Docker image support
Official Docker images for the aigw CLI published to GitHub Container Registry, enabling containerized standalone deployments with proper health checks and lifecycle management.
Multi-provider auto-configuration
Zero-config standalone mode with automatic configuration from
OPENAI_API_KEY,AZURE_OPENAI_API_KEY, orANTHROPIC_API_KEYenvironment variables. Generates complete Envoy configuration with OpenAI SDK compatibility.
MCP server configuration
Native MCP support in standalone mode via
--mcp-configand--mcp-jsonflags, enabling unified LLM and MCP server configuration in a single aigw run invocation without Kubernetes.
XDG Base Directory standards
Proper separation of configuration, data, state, and runtime files following XDG Base Directory specification, improving organization and enabling better cleanup and management of aigw state.
Enhanced readiness monitoring
Improved Envoy readiness detection and status reporting in standalone mode, providing clear insights into when the gateway is ready to accept traffic with better error messages.
Consolidated admin server
Unified admin server on a single port serving both
/metricsand/healthendpoints, simplifying monitoring and health check configuration.
Improved error handling
aigwCLI now fails fast and exits cleanly if external processor fails to start, preventing silent failures and improving debugging experience.
Type-safe Kubernetes client SDK
Generated client libraries for all AI Gateway CRDs following standard Kubernetes client-go patterns, enabling developers to build controllers, operators, and custom integrations with type safety.
Observability Enhancements
MCP operations observability
Comprehensive monitoring, logging, and tracing for MCP operations with configurable access logs and metrics enrichment for MCP server interactions and tool routing.
Image generation tracing and metrics
OpenInference-compliant distributed tracing and OpenTelemetry Gen AI metrics for image generation requests with detailed request parameters and timing information.
OpenTelemetry native metrics export
Support for OTEL-native metrics export (in addition to Prometheus), enabling integration with Elastic Stack, OTEL-TUI, and other OTEL-native observability systems. Includes console exporter for ad-hoc debugging.
Embeddings tracing implementation
Complete OpenInference-compliant tracing for embeddings operations, complementing existing chat completion tracing.
Enhanced /messages endpoint metrics
Distinct metrics for Anthropic's
/messagesendpoint, providing accurate attribution separate from/chat/completionsendpoints.
Original model tracking
Metrics now track both the original requested model and any overridden model names, providing accurate attribution in multi-provider and model virtualization scenarios.
🔗 API Updates
- New MCPRoute CRD
- Introduces MCPRoute custom resource with comprehensive fields for MCP server configuration, tool filtering, authentication policies (OAuth and API key), and Protected Resource Metadata.
- Cross-namespace references in AIGatewayRoute
- Added namespace field to AIGatewayRouteRuleBackendRef, enabling cross-namespace backend references with ReferenceGrant validation.
- Header mutations at route and backend levels
- Added headerMutation fields to both AIServiceBackend and AIGatewayRouteRuleBackendRef for backend-level and per-route header manipulation with smart merge logic.
- New AWSAnthropic API schema
- Added AWSAnthropic schema for Claude models on AWS Bedrock using the native Anthropic Messages API format, providing full feature parity with direct Anthropic API.
- Anthropic API key authentication
- Added AnthropicAPIKey to BackendSecurityPolicy for x-api-key header authentication.
- Azure API key authentication
- Added AzureAPIKey to BackendSecurityPolicy for api-key header authentication.
- **AWS credential chain support...
v0.4.0-rc2
Release candidate for v0.4.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.4.0-rc2 --namespace envoy-ai-gateway-system --create-namespace
v0.4.0-rc1
Release candidate for v0.4.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.4.0-rc1 --namespace envoy-ai-gateway-system --create-namespace
v0.3.0
Release Announcement
Check out the v0.3.0 release notes to learn more about the release.
Envoy AI Gateway v0.3.x
Release version introducing intelligent inference routing with Endpoint Picker Provider, enhanced observability features, Google Vertex AI support, and enhanced provider integrations.
v0.3.0
August 21, 2025
Envoy AI Gateway v0.3.0 introduces intelligent inference routing, expanded provider support (including Google Vertex AI and Anthropic), and enhanced observability with OpenInference tracing and configurable metrics. Key features include Endpoint Picker Provider with InferencePool for dynamic load balancing, model name virtualization, and seamless Gateway API Inference Extension integration.
✨ New Features
Endpoint Picker Provider (EPP) Integration
- Gateway API Inference Extension Support
- Complete integration with Gateway API Inference Extension v0.5.1, enabling intelligent endpoint selection based on real-time AI inference metrics like KV-cache usage, queue depth, and LoRA adapter information.
- Dual Integration Approaches
- Support for both
HTTPRoute + InferencePoolandAIGatewayRoute + InferencePoolintegration patterns, providing flexibility for different use cases from simple to advanced AI routing scenarios.
- Support for both
- Dynamic Load Balancing
- Intelligent routing that automatically selects the optimal inference endpoint for each request, optimizing resource utilization across your entire inference infrastructure with real-time performance metrics.
- Extensible Architecture
- Support for custom endpoint picker providers, allowing implementation of domain-specific routing logic tailored to unique AI workload requirements.
Expanded Provider Ecosystem
- Google Vertex AI Production Support
- Google Vertex AI has moved from work-in-progress to full production support, including complete streaming support for Gemini models with OpenAI API compatibility. View all supported providers →
- Anthropic on Vertex AI Integration
- Complete Anthropic Claude integration via GCP Vertex AI, moving from experimental to production-ready status with multi-tool support and configurable API versions for enterprise deployments.
- Enhanced Gemini Capabilities
- Improved request/response translation for Gemini models with support for tools, response format specification, and advanced conversation handling, making Gemini integration more robust and feature-complete.
- Strengthened OpenAI-Compatible Ecosystem
- Enhanced support for the broader OpenAI-compatible provider ecosystem including Groq, Together AI, Mistral, Cohere, DeepSeek, SambaNova, and more, ensuring seamless integration across the AI provider landscape.
Observability Enhancements
- OpenInference Tracing Support
- Added comprehensive OpenInference distributed tracing with OpenTelemetry integration, providing detailed request tracing and performance monitoring for LLM operations. Includes full chat completion request/response data capture, timing information, and compatibility with evaluation systems like Arize Phoenix. View the documentation →
- Configurable Metrics Labels
- Added support for configuring additional metrics labels corresponding to HTTP request headers. This enables custom labeling of metrics based on specific request headers like user identifiers, API versions, or application contexts, providing more granular monitoring and filtering capabilities.
- Embeddings Metrics Support
- Extended GenAI metrics support to include embeddings operations, providing comprehensive token usage tracking and performance monitoring for both chat completion and embeddings API endpoints with consistent OpenTelemetry semantic conventions.
- Enhanced GenAI Metrics
- Improved AI-specific metrics implementation with better error handling, enhanced attribute mapping, and more accurate token latency measurements. Maintains full compatibility with OpenTelemetry Gen AI semantic conventions while providing more reliable performance analysis data. View the documentation →
Infrastructure and Configuration
-
Model Name Virtualization
- Added a new
modelNameOverridefield in thebackendRefofAIGatewayRoute, enabling flexible model name abstraction across different providers. This allows unified model naming for downstream applications while routing to provider-specific model names, supporting both multi-provider scenarios and fallback configurations. View the documentation →
- Added a new
-
Unified Gateway Support
- Enhanced Gateway resource management by allowing both standard
HTTPRouteandAIGatewayRouteto be attached to the sameGatewayobject. This provides a unified routing configuration that supports both AI and non-AI traffic within a single gateway infrastructure, simplifying deployment and management.
- Enhanced Gateway resource management by allowing both standard
🔗 API Updates
- BackendSecurityPolicy TargetRefs: Added targetRefs field to BackendSecurityPolicy spec, enabling direct targeting of AIServiceBackend resources using Gateway API policy attachment patterns.
- Gateway API Inference Extension: Allows InferencePool resource of Gateway API Inference Extension v0.5.1 to be specified as a backend ref in AIGatewayRoute intelligent endpoint selection.
- modelNameOverride in the backend reference of AIGatewayRoute: Added modelNameOverride field in the backend reference of AIGatewayRoute, allowing for flexible model name rewrite for routing purposes.
Deprecations
backendSecurityPolicyRefPattern: The old pattern of AIServiceBackend referencing BackendSecurityPolicy is deprecated in favor of the new targetRefs approach. Existing configurations will continue to work but should be migrated before v0.4.AIGatewayRoute'stargetRefsPattern: The targetRefs pattern is no longer supported for AIGatewayRoute. Existing configurations will continue to work but should be migrated to parentRefs.AIGatewayRoute's schema Field: The schema field is no longer needed for AIGatewayRoute. Existing configurations will continue to work but should be removed before v0.4.controller.envoyGatewayNamespacehelm value is no longer necessary: This value is no longer necessary and is redundant when configured.controller.podEnvhelm value will be removed: Use controller.extraEnvVars instead. The controller.podEnv value will be removed in v0.4.
📖 Upgrade Guidance
For users upgrading from v0.2.x to v0.3.0:
1. Upgrade Envoy Gateway to v1.5.0 - Ensure you are using Envoy Gateway v1.5.0 or later, as this is required for compatibility with the new AI Gateway features.
2. Update Envoy Gateway config - Update your Envoy Gateway configuration to include the new settings as below. The full manifest is available in the manifests/envoy-gateway-config/config.yaml file as per the getting started guide.
--- a/manifests/envoy-gateway-config/config.yaml
+++ b/manifests/envoy-gateway-config/config.yaml
@@ -43,9 +43,19 @@ data:
extensionManager:
hooks:
xdsTranslator:
+ translation:
+ listener:
+ includeAll: true
+ route:
+ includeAll: true
+ cluster:
+ includeAll: true
+ secret:
+ includeAll: true
post:
- - VirtualHost
- Translation
+ - Cluster
+ - Route
3. Upgrade Envoy AI Gateway to v0.3.0
4. Migrate Gateway target references - Update from the deprecated AIGatewayRoute.targetRefs pattern to the new AIGatewayRoute.parentRefs approach after the upgrade to v0.3.0.
5. Migrate backendSecurityPolicy references - Update from the deprecated AIServiceBackend.backendSecurityPolicyRef pattern to the new BackendSecurityPolicy.targetRefs approach after the upgrade to v0.3.0.
6. Remove AIGatewayRoute.schema - remove the schema field from AIGatewayRoute resources after the upgrade to v0.3.0, as it is no longer used.
📦 Dependencies Versions
- Go 1.24.6
- Updated to latest Go version for improved performance and security.
- Envoy Gateway v1.5
- Built on Envoy Gateway for proven data plane capabilities.
- Envoy v1.35
- Leveraging Envoy Proxy's battle-tested networking capabilities.
- Gateway API v1.3.1
- Support for latest Gateway API specifications.
- Gateway API Inference Extension v0.5.1
- Integration with Gateway API Inference Extension for intelligent endpoint selection.
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Tencent, Google, Nutanix and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
The Endpoint Picker Provider integration represents a significant milestone in making AI inference routing more intelligent and efficient. We appreciate all the feedback and testing from the community that helped shape this feature.
New Contributors
- @sukumargaonkar made their first contribution in #635
- @isyangban made their first contribution in #729
- @whzghb made their first contribution in #743
- @yduwcui made their first contribution in https://github.com/envoyproxy/ai-gatewa...
v0.3.0-rc2
Release candidate
v0.3.0-rc1
Release candidate for v0.3.0!
helm install aieg oci://registry-1.docker.io/envoyproxy/ai-gateway-helm --version v0.3.0-rc1 --namespace envoy-ai-gateway-system --create-namespace
v0.2.1
v0.2.0
Envoy AI Gateway v0.2.x
June 5, 2025
Envoy AI Gateway v0.2.0 builds upon the solid foundation of v0.1.0 with focus on expanding provider ecosystem support, improving reliability and performance through architectural changes, and enterprise-grade authentication support for Azure OpenAI.
Azure OpenAI Integration Sidecar Architecture Performance Improvements CLI Tools Model Failover and Retry Certificate Manager Integration
✨ New Features
Azure OpenAI Integration
- Full Azure OpenAI Support
- Complete integration with Azure OpenAI services with request/response transformation for the unified OpenAI compatiple completions API.
- Upstream Authentication for Azure Enterprise Integration
- Support for accessing Azure via OIDC tokens and Entra ID for enterprise-grade authentication for secure and compliant upstream authentication.
- Enterprise Proxy URL Support for Azure Authentication
- Enhanced Azure authentication with proxy URL configuration options for enterprise proxy support.
- Flexible Token Providers
- Generalized token provider architecture supporting both client secret and federated token flows
Architecture Improvements
- Sidecar and UDS External Processor
- Switched to sidecar deployment model with Unix Domain Sockets for improved performance and resource efficiency
- Enhanced ExtProc Buffer Limits
- Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via
filterConfig.externalProcessor.resourcesfor better resource management.
- Increased external processor buffer limits from 32 KiB to 50 MiB for larger AI requests. Users can now configure CPU and memory resource limits via
- Multiple AIGatewayRoute Support
- Support for multiple AIGatewayRoute resources per gateway, removing the previous single-route limitation. This enables better organization, scalability, and management of complex routing configurations across teams.
- Certificate Manager Integration
- Integrated cert-manager for automated TLS certificate provisioning and rotation for the mutating webhook server that injects AI Gateway sidecar containers into Envoy Gateway pods. This enables enterprise-grade certificate management, eliminating manual certificate handling and improving security.
Cross-Backend Failover and Retry
- Provider Fallback Logic
- Priority-based failover system that automatically routes traffic to lower priority AI providers as higher priority endpoints become unhealthy, ensuring high availability and fault tolerance.
- Backend Retry Support
- Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's
BackendTrafficPolicy.
- Configurable retry policies for improved reliability and resilience against AI provider transient failures. Features include exponential backoff with jitter, configurable retry triggers (5xx errors, connection failures, rate limiting), customizable retry counts and timeouts, and integration with Envoy Gateway's
- Weight-Based Routing
- Enhanced backend routing with weighted traffic distribution, enabling gradual rollouts, cost optimization, and A/B testing across multiple AI providers
Enhanced CLI Tools
-
aigw runCommand- New CLI command for local development and testing of Envoy AI Gateway resources.
-
Configuration Translation
aigw translatefor translating Envoy AI Gateway Resources to Envoy Gateway and Kubernetes CRDs.
🔗 API Updates
- AIGatewayRoute Metadata: Added ownedBy and createdAt fields for better resource tracking.
- Backend Configuration: Moved Backend configuration back to RouteRule for improved flexibility.
- OIDC Field Types: Specific typing for OIDC-related configuration fields.
- Weight Type Changes: Updated Weight field type to match Gateway API specifications.
Deprecations
- AIServiceBackend.Timeouts: Deprecated in favor of more granular timeout configuration.
🐛 Bug Fixes
- ExtProc Image Syncing: Fixed issue where external processor image wouldn't sync properly.
- Router Weight Validation: Fixed negative weight validation in routing logic.
- Content Body Handling: Fixed empty content body issues causing AWS validation errors.
- First Match Routing: Fixed router logic to ensure first match wins as expected.
⚠️ Breaking Changes
- Sidecar Architecture: The switch to sidecar and UDS model may require configuration updates for existing deployments.
- API Field Changes: Some API fields have been moved or renamed - see migration guide for details. Please review the migration guide for details.
- Timeout Configuration: Deprecated timeout fields require migration to new configuration format.
- Routing to Kubernetes Services: Routing to Kubernetes services is not supported in Envoy AI Gateway v0.2.0. This is a known limitation and will be addressed in a future release.
📖 Upgrade Guidance
For users upgrading from v0.1.x to v0.2.0:
- Review usage of any deprecated API fields (particularly AIServiceBackend.Timeouts).
- Update deployment configurations if using custom replica configurations - the replicas field in AIGatewayFilterConfigExternalProcessor is now deprecated due to the new sidecar architecture.
- Remove routing to Kubernetes services - currently, Envoy AI Gateway does not support routing to Kubernetes services. This is a known limitation and will be addressed in a future release.
📦 Dependencies Versions
- Go 1.24.2 - Updated to latest Go version for improved performance and security.
- Envoy Gateway v1.4 - Built on Envoy Gateway for proven data plane capabilities.
- Envoy v1.34 - Leveraging Envoy Proxy's battle-tested networking capabilities.
- Gateway API v1.3 - Support for latest Gateway API specifications.
🙏 Acknowledgements
This release represents the collaborative effort of our growing community. Special thanks to contributors from Tetrate, Bloomberg, Google, and our independent contributors who made this release possible through their code contributions, testing, feedback, and community participation.
There are those who engage in conversations, provide feedback, and contribute to the project in other ways than code, and we appreciate them greatly. Ideas, suggestions, and feedback are always welcome.
🔮 What's Next (beyond v0.2)
We're already working on exciting features:
- Google Gemini & Vertex Integration
- Anthropic Integration
- Support for the Gateway API Inference Extension
- Endpoint picker support for Pod routing
- What else do you want to see? Get involved and open an issue and let us know!
v0.2.0-rc3
Release candidate
v0.2.0-rc1
Release candidate