-
Notifications
You must be signed in to change notification settings - Fork 140
feat: add native Anthropic API support for GCP integration #1022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| var isGzip bool | ||
| switch c.responseEncoding { | ||
| case "gzip": | ||
| br, err = gzip.NewReader(bytes.NewReader(body.Body)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is applicable to openAI python client as it uses httpx python object, plain curl doesn't care about those things we need to verify with Anthropic Python SDK but I assume it would need same removal.
Bottomline: needs to be verified with SDK's but curl works fine
cmd/extproc/mainlib/main.go
Outdated
| } | ||
| server.Register("/v1/chat/completions", extproc.ChatCompletionProcessorFactory(chatCompletionMetrics)) | ||
| server.Register("/v1/embeddings", extproc.EmbeddingsProcessorFactory(embeddingsMetrics)) | ||
| server.Register("/v1/messages", extproc.MessagesProcessorFactory(chatCompletionMetrics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe anthropic/v1/message for clean separation of vendor-specific APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense, prolly make it configurable from yaml ? @yuzisun your take ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is exactly what we are working on it in #1020, so leaving it to us for that separation concern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR landed. you can check the change there and add anthropic prefix config accordingly to do the separation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for more merge conflicts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased and fixed
|
@VarSuren can you sign off the commits ? |
| if headerMutation == nil { | ||
| headerMutation = &extprocv3.HeaderMutation{} | ||
| } | ||
| // TODO: this is a hotfix, we should update this to recompress since its in the header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VarSuren we should not need this logic here to remove content encoding as we already inserted the header mutation filter in the extension server, can you check the implementation in the chat completion processor? https://github.com/envoyproxy/ai-gateway/pull/818/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure lemme check
c78ef54 to
2bf5f3b
Compare
cmd/extproc/mainlib/main.go
Outdated
| server.Register("/v1/chat/completions", extproc.ChatCompletionProcessorFactory(chatCompletionMetrics)) | ||
| server.Register("/v1/embeddings", extproc.EmbeddingsProcessorFactory(embeddingsMetrics)) | ||
| server.Register("/v1/messages", extproc.MessagesProcessorFactory(chatCompletionMetrics)) | ||
| server.Register("/v1/messages?beta=true", extproc.MessagesProcessorFactory(chatCompletionMetrics)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to register this separately ? use case should still be able pass query parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah 100% this should be handled in one processor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually yes, but I didn't find a way to do dynamic query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might need to fix processorForPath to trim query on processor search
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially did that, but that would have affected all endpoint, don't know whether other endpoint might use them in future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but also think about it, you might want to have different processor for beta if say it has additional fields functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since for Anthropic we are mostly pass through and let cloud handle the validation, we should be fine with a single processor.
|
|
||
| // MessagesRequest represents a request to the Anthropic Messages API. | ||
| // Uses a dictionary approach to handle any JSON structure flexibly. | ||
| type MessagesRequest map[string]interface{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason to change to the dict type , is it for efficiency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOSDK has different signatures not compatible with Claude code requests i.e. content: "hello I'm Claude" , go only expects content: [{"type": "text", "content": "Hey I'm Claude"}]
a399e08 to
4a25262
Compare
| func (a *anthropicToGCPAnthropicTranslator) RequestBody(_ []byte, body *anthropicschema.MessagesRequest, _ bool) ( | ||
| headerMutation *extprocv3.HeaderMutation, bodyMutation *extprocv3.BodyMutation, err error, | ||
| ) { | ||
| log.Println("hit GCP translator") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the debug log line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed, btw what's the approach for this project, we don't use logs at all ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still want logging if needed but this one obviously is not :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arguable but okay
| if status, ok := headers[":status"]; ok { | ||
| log.Printf("response status: %s", status) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the debug log line
| func (a *anthropicToGCPAnthropicTranslator) ResponseBody(_ map[string]string, body io.Reader, endOfStream bool) ( | ||
| headerMutation *extprocv3.HeaderMutation, bodyMutation *extprocv3.BodyMutation, tokenUsage LLMTokenUsage, err error, | ||
| ) { | ||
| log.Printf("hit translator - processing response body (endOfStream: %v)", endOfStream) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
| Mutation: &extprocv3.BodyMutation_Body{Body: bodyBytes}, | ||
| } | ||
|
|
||
| log.Println("response translation completed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
bcd7537 to
a370b8b
Compare
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (78.16%) is below the target coverage (86.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #1022 +/- ##
==========================================
- Coverage 78.33% 78.16% -0.18%
==========================================
Files 81 84 +3
Lines 9349 9678 +329
==========================================
+ Hits 7324 7565 +241
- Misses 1681 1755 +74
- Partials 344 358 +14 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
We need extproc e2e test cases here to cover the code:
|
Signed-off-by: Suren Vartanian <[email protected]>
- Add Anthropic schema to CRD validations - Update API documentation - Enable /v1/messages endpoint with GCP Anthropic backend Signed-off-by: Suren Vartanian <[email protected]>
- Remove deleted aws_invokemodel translator references from chatcompletion_processor - Delete AWS Bedrock examples (anthropic-aws-bedrock.yaml, AWS_BEDROCK_SETUP.md) - Update examples/anthropic/README.md to remove all AWS Bedrock references - Fix linting issues: convert if-else to switch statements, add periods to comments - Add new files: messages_processor.go, gcpanthropic_to_native.go translator Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
- Create internal/apischema/anthropic package using official Anthropic SDK types - Update messages_processor.go to follow same pattern as chatcompletion_processor - Replace manual JSON parsing with parseAnthropicMessagesBody() function - Use structured approach: parse to SDK types, access fields directly - Remove schema validation to align with chatcompletion processor - Fix imports to use internal/metrics instead of filterapi/x - Add proper originalRequestBody storage and stream detection Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
- Restore clean version from upstream to resolve syntax errors - Remove duplicate function definition and missing brace issues Signed-off-by: Suren Vartanian <[email protected]>
…oint
- Register dual paths in main.go for /v1/messages and /v1/messages?beta=true
- Simplify messages_processor.go by removing query parameter handling
- Convert MessagesRequest from struct to map[string]interface{} in anthropic schema
- Fix translator tests to use map syntax instead of struct fields
- Clean up debug logs in anthropic_gcpanthropic.go, keep only essential logging
- Update test assertions for stream field inclusion in maps
- Update CRD manifests
This enables Claude Code compatibility by accepting the beta parameter
while maintaining clean endpoint-specific routing architecture.
Signed-off-by: Suren Vartanian <[email protected]>
- Remove emoji icons from all log statements - Remove response body preview logging for security/privacy - Keep only essential logging: translator hits, status, size, errors This ensures production-appropriate logging without exposing sensitive data. Signed-off-by: Suren Vartanian <[email protected]>
…rshal
- Work directly with map[string]interface{} instead of marshal->unmarshal roundtrip
- Copy map directly since MessagesRequest is already map[string]interface{}
- Eliminates 1 json.Marshal() and 1 json.Unmarshal() call per request
- Reduces memory allocations and improves performance
- Maintains same functionality with all tests passing
Signed-off-by: Suren Vartanian <[email protected]>
- Add anthropicPrefix flag with default value '/v1' - Use path.Join for consistent prefix handling like OpenAI endpoints - Register both /messages and /messages?beta=true with configurable prefix - Enables custom Anthropic endpoint prefixes (e.g., /api/anthropic/v1/messages) - Maintains backward compatibility with default /v1 prefix - Follows same pattern as existing openAIPrefix implementation Example usage: --anthropicPrefix='/api/anthropic/v1' # Custom prefix (default uses '/v1' for backward compatibility) Signed-off-by: Suren Vartanian <[email protected]>
- Change anthropicPrefix default from '/v1' to '/anthropic/v1' for clear separation - Remove all debug logging statements from GCP Anthropic translator - Remove unused log import and clean up formatting - Fix unused parameter linting issue (headers -> _) This addresses PR feedback by: 1. Providing clear endpoint separation (/v1/* for OpenAI, /anthropic/v1/* for Anthropic) 2. Removing debug noise for production-ready code 3. Maintaining same functionality with cleaner implementation All tests passing and precommit checks clean. Signed-off-by: Suren Vartanian <[email protected]>
- Strip query parameters in server.go for processor lookup instead of registering duplicate endpoints - Remove duplicate /messages?beta=true registration from main.go - Keep single /messages endpoint registration with query stripping logic - Both /messages and /messages?beta=true now route to same processor cleanly Signed-off-by: Suren Vartanian <[email protected]>
Signed-off-by: Suren Vartanian <[email protected]>
- Add MessagesProcessor for /anthropic/v1/messages endpoint - Add comprehensive unit tests for messages processor and anthropic schema - Add e2e test support with proper GCP Vertex AI configuration - Fixed gcpAnthropicAISchema to include vertex-2023-10-16 version - Improved test coverage to meet project thresholds (70% file, 81% package) Signed-off-by: Suren Vartanian <[email protected]>
48cac5f to
4fe9ffe
Compare
- Add test case for /anthropic/v1/messages endpoint (non-streaming) - Add test case for /anthropic/v1/messages endpoint (streaming) - Fixed streaming response format to match SSE standard with proper whitespace - Tests validate request transformation from Anthropic to GCP Vertex AI format Signed-off-by: Suren Vartanian <[email protected]>
|
/lgtm, @mathetake any more comments for this ? |
|
q: this doesn't support a real Anthropic backend yet right? I think that sounds like a must-have if we want to promote this as a "feature" in v0.3. How long do you think it would take to get it working in a follow up PR? (Also this needs a documentation as well 100%) |
This is a real Anthropic backend just it is hosted on gcp, I think for 0.3 we are targeting to fully support gcp Gemini via chat completion and Anthropic models via either chat completion or native Anthropic message API. We can target for 0.4 to support first party Anthropic and AWS Anthropic via Anthropic message API. |
|
sounds good. could you create the umbrella issue that tracks them like "Anthropic Input/SDK Support" that documents what works now and what doesn't as well as the plans for v0.4? Then this PR is good to go |
|
oh i think we can reuse #847 -- can you post the updates assuming this PR lands? I haven't gotten around to reviewing all but will land it either i push the changes or as-is within today on west coast |
Thanks! updated the issue |
…y#1022) **Description** Allows to use Anthropic Native API with GCP Vertex backend. --------- Signed-off-by: Suren Vartanian <[email protected]> Signed-off-by: Erica Hughberg <[email protected]>
Description
Allows to use Anthropic Native API with GCP Vertex.
Registered endpoint /v1/messages
Tested:
Streaming/non streaming
url -X POST http://localhost:1062/v1/messages \ -H "Content-Type: application/json" \ -H "x-ai-eg-selected-backend: gcp-anthropic-backend" \ -d '{ "model": "claude-3-7-sonnet@20250219", "max_tokens": 1024, "messages": [ { "role": "user", "content": [{"type": "text", "text": "Hey Claude! Are you running on GCP?"}] } ], "temperature": 0.1, "stream": false }'Special notes for reviewers (if applicable)
Anthropic vs GCPAnthropic:
Anthropic API version is not equal to GCPAnthropic version i.e.
Anthropic version passed in header of native API cannot be passed through to GCP. Configuration for that field is expected from yaml.
Model field that is passed in native API cannot be passed in the body to GCP.
I had to re-define root messages struct because go's implementation has no field stream ( in sdk those are two different methods )
I'm using
chatCompletionMetricsin message factory as it's doing what we need and also use generic LLMTokenUsage but naming is misleading maybe re-name it ?Open question
Official GCP SDK limit they way you can pass request i.e.content: "my content"won't work with current set up in requirescontent : [{"text: "my content"}]As a workaround I suggest using plain Map we will loose validation and maybe translator will got a bit messier but it will allow to really do pass throughUsing dictionary as a struct as Claude code sending GO struct non compatible requests
TODO:
verify integration with Claude codeClaude code send to url + ?beta=true so I have to register both, didn't find a way to register dynamic URL.
Initial request from testing is fine regular gets dropped with connection issue but that's some internal configuration I'm not aware of, request itself is valid and can be used if you hit GCP directly.
Extproc config I used: