[Bug]: GCP Vertex Cost Calculation does not consider caching

### What happened?

The cost calculation for GCP Vertex does not consider caching in both calltypes:
- Using Anthropic v1/messages endpoint
- Using GCP Vertex Passthrough

Both call types successfully register the caching usage in the metadata e.g.
```json
  "usage_object": {
    "total_tokens": 25264,
    "prompt_tokens": 25169,
    "completion_tokens": 95,
    "prompt_tokens_details": {
      "audio_tokens": null,
      "cached_tokens": 25162
    },
    "cache_read_input_tokens": 25162,
    "completion_tokens_details": {
      "audio_tokens": null,
      "reasoning_tokens": 0,
      "accepted_prediction_tokens": null,
      "rejected_prediction_tokens": null
    },
    "cache_creation_input_tokens": 222
  },
```

Clearly shows `cached_tokens`, `cache_read_input_tokens` and `cache_creation_input_tokens`. However, the cost calculation does not take these into the account the caching.

Below is a more detailed log gotten from `detailed_debug` flag.

#### Existing vs Expected Behavior

**Existing behavior**:

* **Prompt Cost:** `15239 tokens` × `$0.000003/token` = `$0.045717`
* **Completion Cost:** `494 tokens` × `$0.000015/token` = `$0.007410`
* **Total Cost:** `$0.045717` + `$0.007410` = **`$0.053127`**

**Expected behavior should be more like:**

* **Cost of Cached Prompt Tokens:** `9270 tokens` × `$0.0000003/token` = `$0.002781`
* **Cost of Non-cached Prompt Tokens:** `5969 tokens` × `$0.000003/token` = `$0.017907`
* **Cost of Cache Creation:** `0 tokens` × `$0.00000375/token` = `$0.000000`
* **Cost of Completion:** `494 tokens` × `$0.000015/token` = `$0.007410`
* **Total Cost:** `$0.002781` + `$0.017907` + `$0.000000` + `$0.007410` = **`$0.028098`**

### Relevant log output

```shell
{
  "request_id": "chatcmpl-14f2a9fb-222a-43de-b750-ff13f891df77",
  "call_type": "acompletion",
  "api_key": "REDACTED",
  "cache_hit": "False",
  "startTime": "2025-06-29 17:01:26.393740+00:00",
  "endTime": "2025-06-29 17:01:34.360580+00:00",
  "completionStartTime": "2025-06-29 17:01:28.669701+00:00",
  "model": "claude-sonnet-4@20250514",
  "user": "REDACTED",
  "team_id": "REDACTED",
  "metadata": "{\"user_api_key\": \"REDACTED\", \"user_api_key_alias\": \"yigitcan-personal\", \"user_api_key_team_id\": \"REDACTED\", \"user_api_key_org_id\": null, \"user_api_key_user_id\": \"REDACTED\", \"user_api_key_team_alias\": \"staff-engineers\", \"requester_ip_address\": \"\", \"applied_guardrails\": [], \"batch_models\": null, \"mcp_tool_call_metadata\": null, \"vector_store_request_metadata\": null, \"guardrail_information\": null, \"usage_object\": {\"completion_tokens\": 494, \"prompt_tokens\": 15239, \"total_tokens\": 15733, \"completion_tokens_details\": {\"accepted_prediction_tokens\": null, \"audio_tokens\": null, \"reasoning_tokens\": 0, \"rejected_prediction_tokens\": null}, \"prompt_tokens_details\": {\"audio_tokens\": null, \"cached_tokens\": 9270}, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 9270}, \"model_map_information\": {\"model_map_key\": \"claude-sonnet-4@20250514\", \"model_map_value\": {\"key\": \"vertex_ai/claude-sonnet-4@20250514\", \"max_tokens\": 64000, \"max_input_tokens\": 200000, \"max_output_tokens\": 64000, \"input_cost_per_token\": 3e-06, \"cache_creation_input_token_cost\": 3.75e-06, \"cache_read_input_token_cost\": 3e-07, \"input_cost_per_character\": null, \"input_cost_per_token_above_128k_tokens\": null, \"input_cost_per_token_above_200k_tokens\": null, \"input_cost_per_query\": null, \"input_cost_per_second\": null, \"input_cost_per_audio_token\": null, \"input_cost_per_token_batches\": null, \"output_cost_per_token_batches\": null, \"output_cost_per_token\": 1.5e-05, \"output_cost_per_audio_token\": null, \"output_cost_per_character\": null, \"output_cost_per_reasoning_token\": null, \"output_cost_per_token_above_128k_tokens\": null, \"output_cost_per_character_above_128k_tokens\": null, \"output_cost_per_token_above_200k_tokens\": null, \"output_cost_per_second\": null, \"output_cost_per_image\": null, \"output_vector_size\": null, \"citation_cost_per_token\": null, \"litellm_provider\": \"vertex_ai-anthropic_models\", \"mode\": \"chat\", \"supports_system_messages\": null, \"supports_response_schema\": true, \"supports_vision\": true, \"supports_function_calling\": true, \"supports_tool_choice\": true, \"supports_assistant_prefill\": true, \"supports_prompt_caching\": true, \"supports_audio_input\": null, \"supports_audio_output\": null, \"supports_pdf_input\": true, \"supports_embedding_image_input\": null, \"supports_native_streaming\": null, \"supports_web_search\": null, \"supports_url_context\": null, \"supports_reasoning\": true, \"supports_computer_use\": true, \"search_context_cost_per_query\": {\"search_context_size_low\": 0.01, \"search_context_size_medium\": 0.01, \"search_context_size_high\": 0.01}, \"tpm\": null, \"rpm\": null, \"supported_openai_params\": [\"stream\", \"stop\", \"temperature\", \"top_p\", \"max_tokens\", \"max_completion_tokens\", \"tools\", \"tool_choice\", \"extra_headers\", \"parallel_tool_calls\", \"response_format\", \"user\", \"reasoning_effort\", \"web_search_options\", \"thinking\"]}}, \"additional_usage_values\": {\"completion_tokens_details\": {\"accepted_prediction_tokens\": null, \"audio_tokens\": null, \"reasoning_tokens\": 0, \"rejected_prediction_tokens\": null, \"text_tokens\": null}, \"prompt_tokens_details\": {\"audio_tokens\": null, \"cached_tokens\": 0, \"text_tokens\": null, \"image_tokens\": null}, \"cache_creation_input_tokens\": 0, \"cache_read_input_tokens\": 9270}}",
  "cache_key": "Cache OFF",
  "spend": 0.053127,
  "total_tokens": 15733,
  "prompt_tokens": 15239,
  "completion_tokens": 494,
  "request_tags": "[\"User-Agent: Bn\", \"User-Agent: Bn/JS 5.5.1\"]",
  "end_user": "",
  "api_base": "https://us-east5-aiplatform.googleapis.com/v1/projects/REDACTED/locations/us-east5/publishers/anthropic/models/claude-sonnet-4@20250514:streamRawPredict",
  "model_group": "claude-sonnet-4-20250514",
  "model_id": "a606df091cc3e2eb6cf35317f872719e91c23af5dafc3c33602a7c372b5aa7ac",
  "requester_ip_address": "",
  "custom_llm_provider": "vertex_ai",
  "messages": "{}",
  "response": "{}",
  "proxy_server_request": "{}",
  "session_id": "REDACTED",
  "status": "success"
}
```

### Are you a ML Ops Team?

No

### What LiteLLM version are you on ?

main-v1.73.6-nightly

### Twitter / LinkedIn details

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: GCP Vertex Cost Calculation does not consider caching #12149

What happened?

Existing vs Expected Behavior

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: GCP Vertex Cost Calculation does not consider caching #12149

Description

What happened?

Existing vs Expected Behavior

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions