Skip to content

[Bug] Bump SGLang version to 0.4.6.post4; Fix AsyncSGLangRollout #1482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

HollowMan6
Copy link
Contributor

Checklist Before Starting

  • Search for similar PR(s).

What does this PR do?

Similar to sgl-project/sglang#5997

High-Level Design

In the PP PR sgl-project/sglang#5724 broadcast_pyobj function changed its condition from judging rank==0 (if rank is local rank 0 of the passing ProcessGroup) to rank==src (if rank is global rank src), which breaks VerlEngine's broadcast logic when dp>1 and tp>1.

Additional Info.

  • Issue Number: none
  • Training: both
  • Inference: SGLang

Checklist Before Submitting

  • Read the Contribute Guide.
  • Apply pre-commit checks.
  • Add [BREAKING] to the PR title if it breaks any API.
  • Update the documentation about your changes in the docs.
  • Add CI test(s) if neccessary.

@HollowMan6 HollowMan6 force-pushed the sglang_async branch 2 times, most recently from 767e144 to 97e49d4 Compare May 13, 2025 11:52
@HollowMan6 HollowMan6 changed the title [Bug] Fix AsyncSGLangRollout for SGLang >= 0.4.6.post2 [Bug] Bump SGLang version to 0.4.6.post4; Fix AsyncSGLangRollout May 13, 2025
vermouth1992
vermouth1992 previously approved these changes May 15, 2025
@HollowMan6
Copy link
Contributor Author

CI failed as the container is still using an old SGLang version, these versions are bumped in 3e21199, but would still need some help to push ocss884/verl-sglang/ngc-th2.6.0-cu126-sglang0.4.6.post4 to docker hub. cc: @ocss884

@vermouth1992
Copy link
Collaborator

We will need to fix the CI and address the conflict

Similar to sgl-project/sglang#5997

In the PP PR sgl-project/sglang#5724
broadcast_pyobj function changed its
condition from judging rank==0 (if rank is local rank 0 of
the passing ProcessGroup) to rank==src (if rank is global
rank src), which breaks VerlEngine's broadcast logic
when dp>1 and tp>1.

Signed-off-by: Hollow Man <[email protected]>
Signed-off-by: Hollow Man <[email protected]>
@ocss884
Copy link
Collaborator

ocss884 commented May 17, 2025

Ah just noticed there is another bump pr here. Could you help take a look at whether your changes in AsyncSGLangRollout are being addressed in #1558? Thanks! cc @SwordFaith

@HollowMan6
Copy link
Contributor Author

Looks like everything is included in #1558, so close this one.

@HollowMan6 HollowMan6 closed this May 18, 2025
@HollowMan6 HollowMan6 deleted the sglang_async branch May 18, 2025 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants