Reuse gpu model runner #11

jikunshang · 2025-05-06T01:49:01Z

No description provided.

Signed-off-by: Kunshang Ji <[email protected]> some v1 fixes Signed-off-by: Kunshang Ji <[email protected]> remove useless file Signed-off-by: Kunshang Ji <[email protected]> remove Signed-off-by: Kunshang Ji <[email protected]> add V1 test and set spawn in docker env Signed-off-by: Kunshang Ji <[email protected]> add missing dependency Signed-off-by: Kunshang Ji <[email protected]> fix test Signed-off-by: Kunshang Ji <[email protected]> update api name Signed-off-by: Kunshang Ji <[email protected]> update api Signed-off-by: Kunshang Ji <[email protected]> update default block size for v1 Signed-off-by: Kunshang Ji <[email protected]> update memory usage Signed-off-by: Kunshang Ji <[email protected]> fix rebase issues Signed-off-by: Kunshang Ji <[email protected]> fix rebase, spec decode meta set to none Signed-off-by: Kunshang Ji <[email protected]> add xpu v1 config check Signed-off-by: Kunshang Ji <[email protected]> add mem log Signed-off-by: Kunshang Ji <[email protected]> fix init cache Signed-off-by: Kunshang Ji <[email protected]> add xpu profiler for V1 Signed-off-by: Kunshang Ji <[email protected]> update rebase issue Signed-off-by: Kunshang Ji <[email protected]> update prepare_inputs for perf Signed-off-by: Kunshang Ji <[email protected]> update Signed-off-by: Kunshang Ji <[email protected]> refine xpu_model_runner Signed-off-by: Kunshang Ji <[email protected]>

Signed-off-by: Kunshang Ji <[email protected]>

github-actions · 2025-05-06T01:49:10Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

xuechendi · 2025-05-06T01:54:40Z

vllm/v1/worker/gpu_model_runner.py

+                    num_scheduled_tokens)
+        self.seq_start_loc_np[0] = 0
+        np.cumsum(seq_lens, out=self.seq_start_loc_np[1:num_reqs + 1])
+        # ======== XPU end =========


Is this the only needed hard code for XPU?
By adding this property? self.seq_start_loc_np

ipex attn (chunked prefill) use flashattn v2, need this parameters.

xuechendi · 2025-05-06T01:56:26Z

vllm/v1/worker/gpu_worker.py

+            torch.xpu.empty_cache()
+            self.init_gpu_memory = torch.xpu.get_device_properties(
+                self.local_rank).total_memory
+            backend = "ccl"


Not sure if we can also come up with similiar way to hide these part as well, I think maybe we can abstract that to xpu.py?

jikunshang · 2025-05-06T01:58:04Z

vllm/platforms/__init__.py

@@ -146,6 +146,7 @@ def xpu_platform_plugin() -> Optional[str]:
        if hasattr(torch, 'xpu') and torch.xpu.is_available():
            is_xpu = True
            logger.debug("Confirmed XPU platform is available.")
+            torch.cuda = torch.xpu


replace torch.cuda with torch.xpu

jikunshang · 2025-05-06T01:58:43Z

vllm/platforms/xpu.py

-        if parallel_config.worker_cls == "auto":
+        if envs.VLLM_USE_V1:
+            parallel_config.worker_cls =\
+                "vllm.v1.worker.gpu_worker.Worker"


Note. use gpu worker instead of xpu worker.

jikunshang · 2025-05-06T01:59:56Z

vllm/v1/worker/gpu_worker.py

@@ -10,7 +10,7 @@

 import vllm.envs as envs
 from vllm.config import VllmConfig
-from vllm.device_allocator.cumem import CuMemAllocator
+# from vllm.device_allocator.cumem import CuMemAllocator


used for sleep mode. to be fix.

Signed-off-by: Kunshang Ji <[email protected]>

jikunshang added 4 commits May 6, 2025 07:50

fix sampler, attn metadata

e3ba9e5

Signed-off-by: Kunshang Ji <[email protected]>

fix rebase issues

60c3d9e

Signed-off-by: Kunshang Ji <[email protected]>

reuse gpu_runner

bc96239

xuechendi reviewed May 6, 2025

View reviewed changes

jikunshang commented May 6, 2025

View reviewed changes

jikunshang pushed a commit that referenced this pull request May 7, 2025

Update Dockerfile.hpu with base image on 1.20.0 (#11)

0f7b69f

jikunshang added 2 commits May 8, 2025 08:42

fix cuMemAllocator

f0fc0bd

Signed-off-by: Kunshang Ji <[email protected]>

reuse init_device

cf7b8f2

Signed-off-by: Kunshang Ji <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reuse gpu model runner #11

Reuse gpu model runner #11

Uh oh!

jikunshang commented May 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

xuechendi May 6, 2025 •

edited

Loading

Uh oh!

jikunshang May 6, 2025

Uh oh!

xuechendi May 6, 2025

Uh oh!

jikunshang May 6, 2025

Uh oh!

jikunshang May 6, 2025

Uh oh!

jikunshang May 6, 2025

Uh oh!

Uh oh!

Reuse gpu model runner #11

Are you sure you want to change the base?

Reuse gpu model runner #11

Uh oh!

Conversation

jikunshang commented May 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 6, 2025

Uh oh!

xuechendi May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jikunshang May 6, 2025

Choose a reason for hiding this comment

Uh oh!

xuechendi May 6, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang May 6, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang May 6, 2025

Choose a reason for hiding this comment

Uh oh!

jikunshang May 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jikunshang commented May 6, 2025 •

edited by github-actions bot

Loading

xuechendi May 6, 2025 •

edited

Loading