Skip to content

Reuse gpu model runner #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Reuse gpu model runner #11

wants to merge 6 commits into from

Conversation

jikunshang
Copy link
Owner

@jikunshang jikunshang commented May 6, 2025

No description provided.

jikunshang added 4 commits May 6, 2025 07:50
Signed-off-by: Kunshang Ji <[email protected]>

some v1 fixes

Signed-off-by: Kunshang Ji <[email protected]>

remove useless file

Signed-off-by: Kunshang Ji <[email protected]>

remove

Signed-off-by: Kunshang Ji <[email protected]>

add V1 test and set spawn in docker env

Signed-off-by: Kunshang Ji <[email protected]>

add missing dependency

Signed-off-by: Kunshang Ji <[email protected]>

fix test

Signed-off-by: Kunshang Ji <[email protected]>

update api name

Signed-off-by: Kunshang Ji <[email protected]>

update api

Signed-off-by: Kunshang Ji <[email protected]>

update default block size for v1

Signed-off-by: Kunshang Ji <[email protected]>

update memory usage

Signed-off-by: Kunshang Ji <[email protected]>

fix rebase issues

Signed-off-by: Kunshang Ji <[email protected]>

fix rebase, spec decode meta set to none

Signed-off-by: Kunshang Ji <[email protected]>

add xpu v1 config check

Signed-off-by: Kunshang Ji <[email protected]>

add mem log

Signed-off-by: Kunshang Ji <[email protected]>

fix init cache

Signed-off-by: Kunshang Ji <[email protected]>

add xpu profiler for V1

Signed-off-by: Kunshang Ji <[email protected]>

update rebase issue

Signed-off-by: Kunshang Ji <[email protected]>

update prepare_inputs for perf

Signed-off-by: Kunshang Ji <[email protected]>

update

Signed-off-by: Kunshang Ji <[email protected]>

refine xpu_model_runner

Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Copy link

github-actions bot commented May 6, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

num_scheduled_tokens)
self.seq_start_loc_np[0] = 0
np.cumsum(seq_lens, out=self.seq_start_loc_np[1:num_reqs + 1])
# ======== XPU end =========
Copy link

@xuechendi xuechendi May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only needed hard code for XPU?
By adding this property? self.seq_start_loc_np

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ipex attn (chunked prefill) use flashattn v2, need this parameters.

torch.xpu.empty_cache()
self.init_gpu_memory = torch.xpu.get_device_properties(
self.local_rank).total_memory
backend = "ccl"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we can also come up with similiar way to hide these part as well, I think maybe we can abstract that to xpu.py?

@@ -146,6 +146,7 @@ def xpu_platform_plugin() -> Optional[str]:
if hasattr(torch, 'xpu') and torch.xpu.is_available():
is_xpu = True
logger.debug("Confirmed XPU platform is available.")
torch.cuda = torch.xpu
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace torch.cuda with torch.xpu

if parallel_config.worker_cls == "auto":
if envs.VLLM_USE_V1:
parallel_config.worker_cls =\
"vllm.v1.worker.gpu_worker.Worker"
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note. use gpu worker instead of xpu worker.

@@ -10,7 +10,7 @@

import vllm.envs as envs
from vllm.config import VllmConfig
from vllm.device_allocator.cumem import CuMemAllocator
# from vllm.device_allocator.cumem import CuMemAllocator
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used for sleep mode. to be fix.

jikunshang added 2 commits May 8, 2025 08:42
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants