forked from opendatahub-io/vllm
-
Notifications
You must be signed in to change notification settings - Fork 13
nm vllm ent 0.8.5 sync #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…m-project#16801) Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: rongfu.leng <[email protected]>
Signed-off-by: Luka Govedič <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
…16796) Signed-off-by: Nathan Weinberg <[email protected]>
…ect#16809) Signed-off-by: windsonsea <[email protected]>
Signed-off-by: Jonghyun Choe <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
…llm-project#16829) Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
…nfig info (vllm-project#16857) Signed-off-by: jmho <[email protected]>
Signed-off-by: omrishiv <[email protected]>
…ect#15130) Signed-off-by: fyabc <[email protected]> Signed-off-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Roger Wang <[email protected]> Co-authored-by: Xiong Wang <[email protected]>
Signed-off-by: Divakar Verma <[email protected]>
…llm-project#16591) Signed-off-by: Jannis Schönleber <[email protected]> Signed-off-by: NickLucche <[email protected]> Co-authored-by: Jannis Schönleber <[email protected]>
Signed-off-by: NickLucche <[email protected]>
…vllm-project#16460) Signed-off-by: vie-serendipity <[email protected]>
… V1 (vllm-project#15477) Signed-off-by: Isotr0py <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: reidliu41 <[email protected]> Co-authored-by: reidliu41 <[email protected]>
Signed-off-by: rzou <[email protected]>
Signed-off-by: Staszek Pasko <[email protected]> Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Harry Mellor <[email protected]>
Signed-off-by: rzou <[email protected]>
Signed-off-by: qizixi <[email protected]>
Signed-off-by: Woosuk Kwon <[email protected]>
- remove build steps/dependencies - allow for installing pre-built flash-attention/vllm wheels - default ROCM_VERSION to 6.3.4, allowing ovverride with env vars - cleanup rocm docker bake, defaults - amdsmi: use setup.py to build - add amdsmi bind mount - remove flashinfer from rocm target - bump vllm-tgis-adapter to 0.7.0 - Dockerfile*.ubi: bump ubi base
Signed-off-by: Russell Bryant <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
- remove build steps/dependencies - allow for installing pre-built flash-attention/vllm wheels - default ROCM_VERSION to 6.3.4, allowing ovverride with env vars - cleanup rocm docker bake, defaults - amdsmi: use setup.py to build - add amdsmi bind mount - remove flashinfer from rocm target - bump vllm-tgis-adapter to 0.7.0 - Dockerfile*.ubi: bump ubi base
…-project#17303) Signed-off-by: Harry Mellor <[email protected]>
…vllm-project#17255) Signed-off-by: Harry Mellor <[email protected]>
…rides are ordered (vllm-project#17256) Signed-off-by: Harry Mellor <[email protected]>
…17197) Signed-off-by: Russell Bryant <[email protected]>
Signed-off-by: Aaron Pham <[email protected]> Co-authored-by: Russell Bryant <[email protected]>
…t have shape (metadata_size) (vllm-project#17283) Signed-off-by: Lucas Wilkinson <[email protected]>
…_after_loading`. (vllm-project#16854) Signed-off-by: charlifu <[email protected]>
Signed-off-by: simon-mo <[email protected]>
…#17328) Signed-off-by: mgoin <[email protected]>
Co-authored-by: andy-neuma <[email protected]>
…ct results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <[email protected]>
…client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]> Co-authored-by: mgoin <[email protected]>
…7315) Signed-off-by: Lucia Fang <[email protected]>
Syncing midstream NM fork to Upstream tag of [v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) + cherry pick of vllm-project@be633fb needed for benchmarks + [CP](neuralmagic/nm-vllm-ent@1fe447d) for compressed tensor bump + [CP](vllm-project#17677) for lora on AMD + [CP](vllm-project#17315) for llama4 w/ pure dense layers ``` commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5) Author: Chauncey <[email protected]> Date: Wed Apr 30 15:11:04 2025 +0800 [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434) Signed-off-by: chaunceyjiang <[email protected]> commit f8db0bd Author: Lucas Wilkinson <[email protected]> Date: Fri May 2 14:01:38 2025 -0400 [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574) Signed-off-by: Lucas Wilkinson <[email protected]> commit e335c34 Author: Robert Shaw <[email protected]> Date: Fri May 2 04:07:03 2025 -0400 [BugFix] Fix Memory Leak (vllm-project#17567) Signed-off-by: [email protected] <[email protected]> commit cc463fe Merge: 1e358ff ba41cc9 Author: Selbi Nuryyeva <[email protected]> Date: Tue Apr 29 12:34:57 2025 -0400 Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5 commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5) Author: Michael Goin <[email protected]> Date: Mon Apr 28 16:20:24 2025 -0600 [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328) Signed-off-by: mgoin <[email protected]> commit dcbac4c Author: Simon Mo <[email protected]> Date: Mon Apr 28 14:12:01 2025 -0700 [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318) Signed-off-by: simon-mo <[email protected]> [...] ``` Commands ``` git fetch upstream git checkout -b upstream-v0.8.5 git merge upstream/v0.8.5 git cherry-pick be633fb ``` TEST PLAN accept sync: https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552 related PR in cicd: neuralmagic/nm-cicd#99 release workflow: https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864
This bumps the cuda version in the base layer to 12-8 instead of 12-4. This could break something if during dep install we have to build a dependency from source, as the wheels we bring in later in prepare are now being built against 12.8. FIX #xxxx (*link existing issues this PR will resolve*) <!--- pyml disable-next-line no-emphasis-as-heading --> **BEFORE SUBMITTING, PLEASE READ <https://docs.vllm.ai/en/latest/contributing/overview.html>** (anything written below this line will be removed by GitHub Actions)
notable conflicts were in Dockerfile.rocm.ubi and Dockerfile.ubi Up to date with Upstream v0.8.5.post1 tag and includes CPs for lora, llama4, compressed tensors bump
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
SchedulerConfig
(Improve configs -SchedulerConfig
vllm-project/vllm#16533)max-num-batched-tokens
is not a power of 2 ([TPU][V1] Fix exponential padding whenmax-num-batched-tokens
is not a power of 2 vllm-project/vllm#16596)pyzmq
version ([BugFix]: Update minimumpyzmq
version vllm-project/vllm#16549)vllm bench [latency, throughput]
CLI commands (Addvllm bench [latency, throughput]
CLI commands vllm-project/vllm#16508)compressed-tensors
WNA16 to support zero-points ([Misc] Updatecompressed-tensors
WNA16 to support zero-points vllm-project/vllm#14211)backend_xgrammar.py
([V1][Structured Output] Move xgrammar related utils tobackend_xgrammar.py
vllm-project/vllm#16578)additional_dependencies: [toml]
for pre-commit yapf hook ([CI] Cleanupadditional_dependencies: [toml]
for pre-commit yapf hook vllm-project/vllm#16405)TokenizerPoolConfig
+DeviceConfig
(Improve configs -TokenizerPoolConfig
+DeviceConfig
vllm-project/vllm#16603)max-num-batched-tokens
is not even ([TPU][V1] Fix padding recompilation whenmax-num-batched-tokens
is not even vllm-project/vllm#16726)--compilation-config
([Doc] Improve help examples for--compilation-config
vllm-project/vllm#16729)_validate_structured_output()
([V1][Structured Output] Minor modification to_validate_structured_output()
vllm-project/vllm#16748)MultiModalConfig
+PoolerConfig
+DecodingConfig
vllm-project/vllm#16789)nullable_kvs
fallback (Fixnullable_kvs
fallback vllm-project/vllm#16837)v1/audio/transcriptions
endpoint ([Frontend] Add sampling params tov1/audio/transcriptions
endpoint vllm-project/vllm#16591)CacheConfig
(Improve configs -CacheConfig
vllm-project/vllm#16835)_update_states
for GPU model runner ([Perf] Optimize_update_states
for GPU model runner vllm-project/vllm#16910)SpeculativeConfig
(Improve configs -SpeculativeConfig
vllm-project/vllm#16971)collective_rpc
timeout ([BugFix] Remove default multiproc executorcollective_rpc
timeout vllm-project/vllm#17000)tests/kernels/
based on kernel type (Categorizetests/kernels/
based on kernel type vllm-project/vllm#16799)pid
passed tokill_process_tree
isint
formypy
(Ensure thatpid
passed tokill_process_tree
isint
formypy
vllm-project/vllm#17051)CacheConfig.block_size
should always beint
when used (CacheConfig.block_size
should always beint
when used vllm-project/vllm#17052)@property
and private field fordata_parallel_rank_local
(Use@property
and private field fordata_parallel_rank_local
vllm-project/vllm#17053)TokenizerGroup
(SimplifyTokenizerGroup
vllm-project/vllm#16790)LoRAModelRunnerMixin
(Improve static type checking inLoRAModelRunnerMixin
vllm-project/vllm#17104)tool-calling
github label ([CI] Add automation for thetool-calling
github label vllm-project/vllm#17118):markdownhelp:
toEngineArgs
docs so markdown docstrings render properly (Add:markdownhelp:
toEngineArgs
docs so markdown docstrings render properly vllm-project/vllm#17124)LoRAConfig
+PromptAdapterConfig
(Improve configs -LoRAConfig
+PromptAdapterConfig
vllm-project/vllm#16980)SchedulerConfig
args into scheduler config group inEngineArgs
(Move missedSchedulerConfig
args into scheduler config group inEngineArgs
vllm-project/vllm#17131)get_text_config()
instead of checking fortext_config
(Use Transformers helperget_text_config()
instead of checking fortext_config
vllm-project/vllm#17105)LLM.chat()
tokenization ([BugFix][Frontend] FixLLM.chat()
tokenization vllm-project/vllm#16081)-n
in multi-image example ([Bugfix] Fix missing int type for-n
in multi-image example vllm-project/vllm#17223)structural_tag
support using xgrammar ([V1] Addstructural_tag
support using xgrammar vllm-project/vllm#17085)vllm_flash_attn
during development mode ([Chore] added stubs forvllm_flash_attn
during development mode vllm-project/vllm#17228)skip_tokenizer_init
withnum_scheduler_steps
([Bugfix] fix error due to an uninitialized tokenizer when usingskip_tokenizer_init
withnum_scheduler_steps
vllm-project/vllm#9276)stop_token_ids
contents ([Misc] Validatestop_token_ids
contents vllm-project/vllm#17268)PromptAdapterConfig
(Add missing class docstring forPromptAdapterConfig
vllm-project/vllm#17302)get_language_model
to new MLLMs ([Bugfix] Add missingget_language_model
to new MLLMs vllm-project/vllm#17300)platforms/interface.py
([Misc] Minor typo/grammar inplatforms/interface.py
vllm-project/vllm#17307)compressed-tensors
quant method consistent across vLLM (Make name ofcompressed-tensors
quant method consistent across vLLM vllm-project/vllm#17255)process_weights_after_loading
. ([Bugfix] Fix moe weight losing all extra attrs afterprocess_weights_after_loading
. vllm-project/vllm#16854)