-
Notifications
You must be signed in to change notification settings - Fork 2
Reuse gpu model runner #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kunshang Ji <[email protected]> some v1 fixes Signed-off-by: Kunshang Ji <[email protected]> remove useless file Signed-off-by: Kunshang Ji <[email protected]> remove Signed-off-by: Kunshang Ji <[email protected]> add V1 test and set spawn in docker env Signed-off-by: Kunshang Ji <[email protected]> add missing dependency Signed-off-by: Kunshang Ji <[email protected]> fix test Signed-off-by: Kunshang Ji <[email protected]> update api name Signed-off-by: Kunshang Ji <[email protected]> update api Signed-off-by: Kunshang Ji <[email protected]> update default block size for v1 Signed-off-by: Kunshang Ji <[email protected]> update memory usage Signed-off-by: Kunshang Ji <[email protected]> fix rebase issues Signed-off-by: Kunshang Ji <[email protected]> fix rebase, spec decode meta set to none Signed-off-by: Kunshang Ji <[email protected]> add xpu v1 config check Signed-off-by: Kunshang Ji <[email protected]> add mem log Signed-off-by: Kunshang Ji <[email protected]> fix init cache Signed-off-by: Kunshang Ji <[email protected]> add xpu profiler for V1 Signed-off-by: Kunshang Ji <[email protected]> update rebase issue Signed-off-by: Kunshang Ji <[email protected]> update prepare_inputs for perf Signed-off-by: Kunshang Ji <[email protected]> update Signed-off-by: Kunshang Ji <[email protected]> refine xpu_model_runner Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
num_scheduled_tokens) | ||
self.seq_start_loc_np[0] = 0 | ||
np.cumsum(seq_lens, out=self.seq_start_loc_np[1:num_reqs + 1]) | ||
# ======== XPU end ========= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only needed hard code for XPU?
By adding this property? self.seq_start_loc_np
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ipex attn (chunked prefill) use flashattn v2, need this parameters.
vllm/v1/worker/gpu_worker.py
Outdated
torch.xpu.empty_cache() | ||
self.init_gpu_memory = torch.xpu.get_device_properties( | ||
self.local_rank).total_memory | ||
backend = "ccl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we can also come up with similiar way to hide these part as well, I think maybe we can abstract that to xpu.py?
@@ -146,6 +146,7 @@ def xpu_platform_plugin() -> Optional[str]: | |||
if hasattr(torch, 'xpu') and torch.xpu.is_available(): | |||
is_xpu = True | |||
logger.debug("Confirmed XPU platform is available.") | |||
torch.cuda = torch.xpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace torch.cuda with torch.xpu
if parallel_config.worker_cls == "auto": | ||
if envs.VLLM_USE_V1: | ||
parallel_config.worker_cls =\ | ||
"vllm.v1.worker.gpu_worker.Worker" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note. use gpu worker instead of xpu worker.
vllm/v1/worker/gpu_worker.py
Outdated
@@ -10,7 +10,7 @@ | |||
|
|||
import vllm.envs as envs | |||
from vllm.config import VllmConfig | |||
from vllm.device_allocator.cumem import CuMemAllocator | |||
# from vllm.device_allocator.cumem import CuMemAllocator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
used for sleep mode. to be fix.
Signed-off-by: Kunshang Ji <[email protected]>
Signed-off-by: Kunshang Ji <[email protected]>
No description provided.