Skip to content

[Bug]: GCP Vertex Cost Calculation does not consider caching #12149

Closed
@yigitcan-dh

Description

@yigitcan-dh

What happened?

The cost calculation for GCP Vertex does not consider caching in both calltypes:

  • Using Anthropic v1/messages endpoint
  • Using GCP Vertex Passthrough

Both call types successfully register the caching usage in the metadata e.g.

  "usage_object": {
    "total_tokens": 25264,
    "prompt_tokens": 25169,
    "completion_tokens": 95,
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 25162
    },
    "cache_read_input_tokens": 25162,
    "completion_tokens_details": {
      "audio_tokens": null,
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": null,
      "rejected_prediction_tokens": null
    },
    "cache_creation_input_tokens": 222
  },

Clearly shows cached_tokens, cache_read_input_tokens and cache_creation_input_tokens. However, the cost calculation does not take these into the account the caching.

Below is a more detailed log gotten from detailed_debug flag.

Existing vs Expected Behavior

Existing behavior:

  • Prompt Cost: 15239 tokens × $0.000003/token = $0.045717
  • Completion Cost: 494 tokens × $0.000015/token = $0.007410
  • Total Cost: $0.045717 + $0.007410 = $0.053127

Expected behavior should be more like:

  • Cost of Cached Prompt Tokens: 9270 tokens × $0.0000003/token = $0.002781
  • Cost of Non-cached Prompt Tokens: 5969 tokens × $0.000003/token = $0.017907
  • Cost of Cache Creation: 0 tokens × $0.00000375/token = $0.000000
  • Cost of Completion: 494 tokens × $0.000015/token = $0.007410
  • Total Cost: $0.002781 + $0.017907 + $0.000000 + $0.007410 = $0.028098

Relevant log output

{
  "request_id": "chatcmpl-14f2a9fb-222a-43de-b750-ff13f891df77",
  "call_type": "acompletion",
  "api_key": "REDACTED",
  "cache_hit": "False",
  "startTime": "2025-06-29 17:01:26.393740+00:00",
  "endTime": "2025-06-29 17:01:34.360580+00:00",
  "completionStartTime": "2025-06-29 17:01:28.669701+00:00",
  "model": "claude-sonnet-4@20250514",
  "user": "REDACTED",
  "team_id": "REDACTED",
  "metadata": "{\"user_api_key\": \"REDACTED\", \"user_api_key_alias\": \"yigitcan-personal\", \"user_api_key_team_id\": \"REDACTED\", \"user_api_key_org_id\": null, \"user_api_key_user_id\": \"REDACTED\", \"user_api_key_team_alias\": \"staff-engineers\", \"requester_ip_address\": \"\", \"applied_guardrails\": [], \"batch_models\": null, \"mcp_tool_call_metadata\": null, \"vector_store_request_metadata\": null, \"guardrail_information\": null, \"usage_object\": {\"completion_tokens\": 494, \"prompt_tokens\": 15239, \"total_tokens\": 15733, \"completion_tokens_details\": {\"accepted_prediction_tokens\": null, \"audio_tokens\": null, \"reasoning_tokens\": 0, \"rejected_prediction_tokens\": null}, \"prompt_tokens_details\": {\"audio_tokens\": null, \"cached_tokens\": 9270}, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 9270}, \"model_map_information\": {\"model_map_key\": \"claude-sonnet-4@20250514\", \"model_map_value\": {\"key\": \"vertex_ai/claude-sonnet-4@20250514\", \"max_tokens\": 64000, \"max_input_tokens\": 200000, \"max_output_tokens\": 64000, \"input_cost_per_token\": 3e-06, \"cache_creation_input_token_cost\": 3.75e-06, \"cache_read_input_token_cost\": 3e-07, \"input_cost_per_character\": null, \"input_cost_per_token_above_128k_tokens\": null, \"input_cost_per_token_above_200k_tokens\": null, \"input_cost_per_query\": null, \"input_cost_per_second\": null, \"input_cost_per_audio_token\": null, \"input_cost_per_token_batches\": null, \"output_cost_per_token_batches\": null, \"output_cost_per_token\": 1.5e-05, \"output_cost_per_audio_token\": null, \"output_cost_per_character\": null, \"output_cost_per_reasoning_token\": null, \"output_cost_per_token_above_128k_tokens\": null, \"output_cost_per_character_above_128k_tokens\": null, \"output_cost_per_token_above_200k_tokens\": null, \"output_cost_per_second\": null, \"output_cost_per_image\": null, \"output_vector_size\": null, \"citation_cost_per_token\": null, \"litellm_provider\": \"vertex_ai-anthropic_models\", \"mode\": \"chat\", \"supports_system_messages\": null, \"supports_response_schema\": true, \"supports_vision\": true, \"supports_function_calling\": true, \"supports_tool_choice\": true, \"supports_assistant_prefill\": true, \"supports_prompt_caching\": true, \"supports_audio_input\": null, \"supports_audio_output\": null, \"supports_pdf_input\": true, \"supports_embedding_image_input\": null, \"supports_native_streaming\": null, \"supports_web_search\": null, \"supports_url_context\": null, \"supports_reasoning\": true, \"supports_computer_use\": true, \"search_context_cost_per_query\": {\"search_context_size_low\": 0.01, \"search_context_size_medium\": 0.01, \"search_context_size_high\": 0.01}, \"tpm\": null, \"rpm\": null, \"supported_openai_params\": [\"stream\", \"stop\", \"temperature\", \"top_p\", \"max_tokens\", \"max_completion_tokens\", \"tools\", \"tool_choice\", \"extra_headers\", \"parallel_tool_calls\", \"response_format\", \"user\", \"reasoning_effort\", \"web_search_options\", \"thinking\"]}}, \"additional_usage_values\": {\"completion_tokens_details\": {\"accepted_prediction_tokens\": null, \"audio_tokens\": null, \"reasoning_tokens\": 0, \"rejected_prediction_tokens\": null, \"text_tokens\": null}, \"prompt_tokens_details\": {\"audio_tokens\": null, \"cached_tokens\": 0, \"text_tokens\": null, \"image_tokens\": null}, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 9270}}",
  "cache_key": "Cache OFF",
  "spend": 0.053127,
  "total_tokens": 15733,
  "prompt_tokens": 15239,
  "completion_tokens": 494,
  "request_tags": "[\"User-Agent: Bn\", \"User-Agent: Bn/JS 5.5.1\"]",
  "end_user": "",
  "api_base": "https://us-east5-aiplatform.googleapis.com/v1/projects/REDACTED/locations/us-east5/publishers/anthropic/models/claude-sonnet-4@20250514:streamRawPredict",
  "model_group": "claude-sonnet-4-20250514",
  "model_id": "a606df091cc3e2eb6cf35317f872719e91c23af5dafc3c33602a7c372b5aa7ac",
  "requester_ip_address": "",
  "custom_llm_provider": "vertex_ai",
  "messages": "{}",
  "response": "{}",
  "proxy_server_request": "{}",
  "session_id": "REDACTED",
  "status": "success"
}

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

main-v1.73.6-nightly

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions