[Feature]: guided decoding on TPU

### 🚀 The feature, motivation and pitch

I’m not sure if this is possible, but right now the `execute_model` function on the `TPUModelRunner` is only outputting the predicted token_ids, rather than the distribution of tokens that we can sample from with some guidance (e.g., using outlines). I believe structured output is becoming more common, and most projects that require LLMs need this structured output feature.

### Alternatives

_No response_

### Additional context

_No response_

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Feature]: guided decoding on TPU #11104

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: guided decoding on TPU #11104

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions