Closed
Description
What happened?
The cost calculation for GCP Vertex does not consider caching in both calltypes:
- Using Anthropic v1/messages endpoint
- Using GCP Vertex Passthrough
Both call types successfully register the caching usage in the metadata e.g.
"usage_object": {
"total_tokens": 25264,
"prompt_tokens": 25169,
"completion_tokens": 95,
"prompt_tokens_details": {
"audio_tokens": null,
"cached_tokens": 25162
},
"cache_read_input_tokens": 25162,
"completion_tokens_details": {
"audio_tokens": null,
"reasoning_tokens": 0,
"accepted_prediction_tokens": null,
"rejected_prediction_tokens": null
},
"cache_creation_input_tokens": 222
},
Clearly shows cached_tokens
, cache_read_input_tokens
and cache_creation_input_tokens
. However, the cost calculation does not take these into the account the caching.
Below is a more detailed log gotten from detailed_debug
flag.
Existing vs Expected Behavior
Existing behavior:
- Prompt Cost:
15239 tokens
×$0.000003/token
=$0.045717
- Completion Cost:
494 tokens
×$0.000015/token
=$0.007410
- Total Cost:
$0.045717
+$0.007410
=$0.053127
Expected behavior should be more like:
- Cost of Cached Prompt Tokens:
9270 tokens
×$0.0000003/token
=$0.002781
- Cost of Non-cached Prompt Tokens:
5969 tokens
×$0.000003/token
=$0.017907
- Cost of Cache Creation:
0 tokens
×$0.00000375/token
=$0.000000
- Cost of Completion:
494 tokens
×$0.000015/token
=$0.007410
- Total Cost:
$0.002781
+$0.017907
+$0.000000
+$0.007410
=$0.028098
Relevant log output
{
"request_id": "chatcmpl-14f2a9fb-222a-43de-b750-ff13f891df77",
"call_type": "acompletion",
"api_key": "REDACTED",
"cache_hit": "False",
"startTime": "2025-06-29 17:01:26.393740+00:00",
"endTime": "2025-06-29 17:01:34.360580+00:00",
"completionStartTime": "2025-06-29 17:01:28.669701+00:00",
"model": "claude-sonnet-4@20250514",
"user": "REDACTED",
"team_id": "REDACTED",
"metadata": "{\"user_api_key\": \"REDACTED\", \"user_api_key_alias\": \"yigitcan-personal\", \"user_api_key_team_id\": \"REDACTED\", \"user_api_key_org_id\": null, \"user_api_key_user_id\": \"REDACTED\", \"user_api_key_team_alias\": \"staff-engineers\", \"requester_ip_address\": \"\", \"applied_guardrails\": [], \"batch_models\": null, \"mcp_tool_call_metadata\": null, \"vector_store_request_metadata\": null, \"guardrail_information\": null, \"usage_object\": {\"completion_tokens\": 494, \"prompt_tokens\": 15239, \"total_tokens\": 15733, \"completion_tokens_details\": {\"accepted_prediction_tokens\": null, \"audio_tokens\": null, \"reasoning_tokens\": 0, \"rejected_prediction_tokens\": null}, \"prompt_tokens_details\": {\"audio_tokens\": null, \"cached_tokens\": 9270}, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 9270}, \"model_map_information\": {\"model_map_key\": \"claude-sonnet-4@20250514\", \"model_map_value\": {\"key\": \"vertex_ai/claude-sonnet-4@20250514\", \"max_tokens\": 64000, \"max_input_tokens\": 200000, \"max_output_tokens\": 64000, \"input_cost_per_token\": 3e-06, \"cache_creation_input_token_cost\": 3.75e-06, \"cache_read_input_token_cost\": 3e-07, \"input_cost_per_character\": null, \"input_cost_per_token_above_128k_tokens\": null, \"input_cost_per_token_above_200k_tokens\": null, \"input_cost_per_query\": null, \"input_cost_per_second\": null, \"input_cost_per_audio_token\": null, \"input_cost_per_token_batches\": null, \"output_cost_per_token_batches\": null, \"output_cost_per_token\": 1.5e-05, \"output_cost_per_audio_token\": null, \"output_cost_per_character\": null, \"output_cost_per_reasoning_token\": null, \"output_cost_per_token_above_128k_tokens\": null, \"output_cost_per_character_above_128k_tokens\": null, \"output_cost_per_token_above_200k_tokens\": null, \"output_cost_per_second\": null, \"output_cost_per_image\": null, \"output_vector_size\": null, \"citation_cost_per_token\": null, \"litellm_provider\": \"vertex_ai-anthropic_models\", \"mode\": \"chat\", \"supports_system_messages\": null, \"supports_response_schema\": true, \"supports_vision\": true, \"supports_function_calling\": true, \"supports_tool_choice\": true, \"supports_assistant_prefill\": true, \"supports_prompt_caching\": true, \"supports_audio_input\": null, \"supports_audio_output\": null, \"supports_pdf_input\": true, \"supports_embedding_image_input\": null, \"supports_native_streaming\": null, \"supports_web_search\": null, \"supports_url_context\": null, \"supports_reasoning\": true, \"supports_computer_use\": true, \"search_context_cost_per_query\": {\"search_context_size_low\": 0.01, \"search_context_size_medium\": 0.01, \"search_context_size_high\": 0.01}, \"tpm\": null, \"rpm\": null, \"supported_openai_params\": [\"stream\", \"stop\", \"temperature\", \"top_p\", \"max_tokens\", \"max_completion_tokens\", \"tools\", \"tool_choice\", \"extra_headers\", \"parallel_tool_calls\", \"response_format\", \"user\", \"reasoning_effort\", \"web_search_options\", \"thinking\"]}}, \"additional_usage_values\": {\"completion_tokens_details\": {\"accepted_prediction_tokens\": null, \"audio_tokens\": null, \"reasoning_tokens\": 0, \"rejected_prediction_tokens\": null, \"text_tokens\": null}, \"prompt_tokens_details\": {\"audio_tokens\": null, \"cached_tokens\": 0, \"text_tokens\": null, \"image_tokens\": null}, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 9270}}",
"cache_key": "Cache OFF",
"spend": 0.053127,
"total_tokens": 15733,
"prompt_tokens": 15239,
"completion_tokens": 494,
"request_tags": "[\"User-Agent: Bn\", \"User-Agent: Bn/JS 5.5.1\"]",
"end_user": "",
"api_base": "https://us-east5-aiplatform.googleapis.com/v1/projects/REDACTED/locations/us-east5/publishers/anthropic/models/claude-sonnet-4@20250514:streamRawPredict",
"model_group": "claude-sonnet-4-20250514",
"model_id": "a606df091cc3e2eb6cf35317f872719e91c23af5dafc3c33602a7c372b5aa7ac",
"requester_ip_address": "",
"custom_llm_provider": "vertex_ai",
"messages": "{}",
"response": "{}",
"proxy_server_request": "{}",
"session_id": "REDACTED",
"status": "success"
}
Are you a ML Ops Team?
No
What LiteLLM version are you on ?
main-v1.73.6-nightly
Twitter / LinkedIn details
No response