Skip to content

nm vllm ent 0.8.5 sync #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 328 commits into from
May 15, 2025
Merged

nm vllm ent 0.8.5 sync #139

merged 328 commits into from
May 15, 2025

Conversation

ckhordiasma
Copy link

LucasWilkinson and others added 30 commits April 17, 2025 22:13
…ect#15130)

Signed-off-by: fyabc <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Xiong Wang <[email protected]>
…llm-project#16591)

Signed-off-by: Jannis Schönleber <[email protected]>
Signed-off-by: NickLucche <[email protected]>
Co-authored-by: Jannis Schönleber <[email protected]>
… V1 (vllm-project#15477)

Signed-off-by: Isotr0py <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Co-authored-by: DarkLight1337 <[email protected]>
Signed-off-by: reidliu41 <[email protected]>
Co-authored-by: reidliu41 <[email protected]>
dtrifiro and others added 28 commits April 28, 2025 15:50
- remove build steps/dependencies
- allow for installing pre-built flash-attention/vllm wheels
- default ROCM_VERSION to 6.3.4, allowing ovverride with env vars
- cleanup rocm docker bake, defaults
- amdsmi: use setup.py to build
- add amdsmi bind mount
- remove flashinfer from rocm target
- bump vllm-tgis-adapter to 0.7.0
- Dockerfile*.ubi: bump ubi base
Signed-off-by: Russell Bryant <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
- remove build steps/dependencies
- allow for installing pre-built flash-attention/vllm wheels
- default ROCM_VERSION to 6.3.4, allowing ovverride with env vars
- cleanup rocm docker bake, defaults
- amdsmi: use setup.py to build
- add amdsmi bind mount
- remove flashinfer from rocm target
- bump vllm-tgis-adapter to 0.7.0
- Dockerfile*.ubi: bump ubi base
Syncing midstream NM fork to Upstream tag of
[v0.8.5.post1](https://github.com/vllm-project/vllm/tree/v0.8.5.post1) +
cherry pick of
vllm-project@be633fb
needed for benchmarks +
[CP](neuralmagic/nm-vllm-ent@1fe447d)
for compressed tensor bump +
[CP](vllm-project#17677) for lora on AMD +
[CP](vllm-project#17315) for llama4 w/ pure
dense layers

```
commit 31c73ba (HEAD -> upstream-v0.8.5, nm-fork/upstream-v0.8.5)
Author: Chauncey <[email protected]>
Date:   Wed Apr 30 15:11:04 2025 +0800

    [Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' (vllm-project#17434)
    
    Signed-off-by: chaunceyjiang <[email protected]>

commit f8db0bd
Author: Lucas Wilkinson <[email protected]>
Date:   Fri May 2 14:01:38 2025 -0400

    [BugFix][Attention] Fix sliding window attention in V1 giving incorrect results (vllm-project#17574)
    
    Signed-off-by: Lucas Wilkinson <[email protected]>

commit e335c34
Author: Robert Shaw <[email protected]>
Date:   Fri May 2 04:07:03 2025 -0400

    [BugFix] Fix Memory Leak (vllm-project#17567)
    
    Signed-off-by: [email protected] <[email protected]>

commit cc463fe
Merge: 1e358ff ba41cc9
Author: Selbi Nuryyeva <[email protected]>
Date:   Tue Apr 29 12:34:57 2025 -0400

    Merge branch 'tag-upstream-v0.8.5' into upstream-v0.8.5

commit ba41cc9 (tag: v0.8.5, tag-upstream-v0.8.5)
Author: Michael Goin <[email protected]>
Date:   Mon Apr 28 16:20:24 2025 -0600

    [Model] Add tuned triton fused_moe configs for Qwen3Moe (vllm-project#17328)
    
    Signed-off-by: mgoin <[email protected]>

commit dcbac4c
Author: Simon Mo <[email protected]>
Date:   Mon Apr 28 14:12:01 2025 -0700

    [Model] Qwen3 Dense FP8 Compat Fixes (vllm-project#17318)
    
    Signed-off-by: simon-mo <[email protected]>
[...]
```

Commands
```
git fetch upstream
git checkout -b upstream-v0.8.5
git merge upstream/v0.8.5
git cherry-pick be633fb
```

TEST PLAN
accept sync:
https://github.com/neuralmagic/nm-cicd/actions/runs/14841223552
related PR in cicd: neuralmagic/nm-cicd#99
release workflow:
https://github.com/neuralmagic/nm-cicd/actions/runs/14845693864
This bumps the cuda version in the base layer to 12-8 instead of 12-4.
This could break something if during dep install
we have to build a dependency from source, as the wheels we bring in
later in prepare are now being built against 12.8.

FIX #xxxx (*link existing issues this PR will resolve*)

<!--- pyml disable-next-line no-emphasis-as-heading -->
**BEFORE SUBMITTING, PLEASE READ
<https://docs.vllm.ai/en/latest/contributing/overview.html>** (anything
written below this line will be removed by GitHub Actions)
notable conflicts were in Dockerfile.rocm.ubi and Dockerfile.ubi

Up to date with Upstream v0.8.5.post1 tag and includes CPs for lora, llama4, compressed tensors bump
@ckhordiasma ckhordiasma merged commit 60c92f8 into main May 15, 2025
3 of 4 checks passed
@ckhordiasma ckhordiasma deleted the nm-vllm-ent-0.8.5-sync branch May 15, 2025 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.