-
Notifications
You must be signed in to change notification settings - Fork 1.7k
[rollout] fix: sglang megatron backend missing generate function #2367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rollout] fix: sglang megatron backend missing generate function #2367
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a generate
function to the sglang megatron backend, addressing issue #2366. It also updates test utilities to support both fsdp
and megatron
backends.
config.actor_rollout_ref.actor.megatron.pipeline_model_parallel_size = 2 | ||
|
||
# FIXME: sglang with megatron param_offload got error: | ||
# "CUDA error: an illegal memory access was encountered" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chenhaiq sglang seems don't work with megatron param_offload=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test with this change?
https://github.com/volcengine/verl/pull/2365/files#diff-3ea40b07302076f020ceb946048c8129b9bd9141feebef80312da55c431de3feR155
Looks like we have some duplicated work in:
https://github.com/volcengine/verl/pull/2365/files#diff-336120079b31a71a3ec16d4e8a9429f73d5f367d8fa4f2f9a3d89d62a8f5a503R695
04f2428
to
e62ab08
Compare
@@ -67,6 +68,7 @@ def are_lists_similar(a, b): | |||
return percentage_difference <= 15 | |||
|
|||
|
|||
@pytest.mark.skip("https://github.com/vllm-project/vllm/issues/16993") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip this test temporary since it's only works with vllm==0.7.3, our ci image is vllm==0.8.5.
Close this PR since #2365 already fixed it. |
What does this PR do?
Close #2366
Checklist Before Starting
[{modules}] {type}: {description}
(This will be checked by the CI){modules}
includefsdp
,megatron
,sglang
,vllm
,rollout
,trainer
,ci
,training_utils
,recipe
,hardware
,deployment
,ray
,worker
,single_controller
,misc
,perf
,model
,algo
,env
,tool
,ckpt
,doc
,data
,
like[megatron, fsdp, doc]
{type}
is infeat
,fix
,refactor
,chore
,test
[BREAKING]
to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batching
Test
API and Usage Example
# Add code snippet or script demonstrating how to use this
High-Level Design
Specific Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
ci-request
channel in theverl
Slack workspace.