Skip to content

Fix structure output #1270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 19, 2025
Merged

Conversation

inkcherry
Copy link

@inkcherry inkcherry commented May 19, 2025

1 if compute logits depend the sampling result , we fetch sample result in advance.
2 if not, we keep its previous behavior to keep performance.

example:
prompt = ("Generate a JSON with the brand, model and car_type of the most iconic car from the 90's")

use dalay_sampling with structure output

{ 
"brand": "Tuner's favorite Mitsubishi", 
"model": "Mitsubishi Lancer Evolution VI", 
"car_type": "Coupe" 
}

cc @czhu15

Copy link

@czhu15 czhu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the quick fix.
Verified it works with both delayed sampling enabled and disabled.
Pls fix the code format issue though. (running bash format.sh under vllm-fork folder)

@inkcherry
Copy link
Author

Thanks a lot for the quick fix. Verified it works with both delayed sampling enabled and disabled. Pls fix the code format issue though. (running bash format.sh under vllm-fork folder)

fixed

Copy link

@czhu15 czhu15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@czhu15 czhu15 merged commit 28c32df into HabanaAI:aice/v1.20.1 May 19, 2025
czhu15 added a commit that referenced this pull request May 20, 2025
czhu15 added a commit that referenced this pull request May 20, 2025
czhu15 pushed a commit that referenced this pull request May 25, 2025
1 if compute logits depend the sampling result , we fetch sample result
in advance.
2 if not,  we keep its previous behavior to keep performance.

example:
```prompt = ("Generate a JSON with the brand, model and car_type of the most iconic car from the 90's") ```

use dalay_sampling with structure output

```
{
"brand": "Tuner's favorite Mitsubishi",
"model": "Mitsubishi Lancer Evolution VI",
"car_type": "Coupe"
}
```

cc @czhu15

---------

Co-authored-by: root <[email protected]>
czhu15 added a commit that referenced this pull request May 27, 2025
HabanaAIUser pushed a commit that referenced this pull request Jul 2, 2025
1 if compute logits depend the sampling result , we fetch sample result
in advance.
2 if not,  we keep its previous behavior to keep performance.

example:
```prompt = ("Generate a JSON with the brand, model and car_type of the most iconic car from the 90's") ```

use dalay_sampling with structure output

```
{
"brand": "Tuner's favorite Mitsubishi",
"model": "Mitsubishi Lancer Evolution VI",
"car_type": "Coupe"
}
```

cc @czhu15

---------

Co-authored-by: root <[email protected]>
jikunshang pushed a commit that referenced this pull request Jul 4, 2025
pick #1270

Under structured_output, Xgrammer needs to modify the logits, and there
is a data dependency on the sampling results.
1 if compute logits depend the sampling result , we fetch sample result
in advance.
2 if not, we keep its previous behavior to keep performance.
@czhu15

---------

Co-authored-by: root <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants