-
Notifications
You must be signed in to change notification settings - Fork 114
Fix structure output #1270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix structure output #1270
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the quick fix.
Verified it works with both delayed sampling enabled and disabled.
Pls fix the code format issue though. (running bash format.sh under vllm-fork folder)
fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Cherry-picked the fix from: #1270
Cherry-picked the fix from: #1270
1 if compute logits depend the sampling result , we fetch sample result in advance. 2 if not, we keep its previous behavior to keep performance. example: ```prompt = ("Generate a JSON with the brand, model and car_type of the most iconic car from the 90's") ``` use dalay_sampling with structure output ``` { "brand": "Tuner's favorite Mitsubishi", "model": "Mitsubishi Lancer Evolution VI", "car_type": "Coupe" } ``` cc @czhu15 --------- Co-authored-by: root <[email protected]>
1 if compute logits depend the sampling result , we fetch sample result in advance. 2 if not, we keep its previous behavior to keep performance. example: ```prompt = ("Generate a JSON with the brand, model and car_type of the most iconic car from the 90's") ``` use dalay_sampling with structure output ``` { "brand": "Tuner's favorite Mitsubishi", "model": "Mitsubishi Lancer Evolution VI", "car_type": "Coupe" } ``` cc @czhu15 --------- Co-authored-by: root <[email protected]>
pick #1270 Under structured_output, Xgrammer needs to modify the logits, and there is a data dependency on the sampling results. 1 if compute logits depend the sampling result , we fetch sample result in advance. 2 if not, we keep its previous behavior to keep performance. @czhu15 --------- Co-authored-by: root <[email protected]>
1 if compute logits depend the sampling result , we fetch sample result in advance.
2 if not, we keep its previous behavior to keep performance.
example:
prompt = ("Generate a JSON with the brand, model and car_type of the most iconic car from the 90's")
use dalay_sampling with structure output
cc @czhu15