-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Improve dp attention port assignment scheme #5889
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Do you have any progress on this pr? @merrymercy @zhyncs @ByronHsu @jokerwyt |
Tested okay. Ready for review and merge. |
The general idea LGTM, but I have no time to review the details now :( If you can find someone to review then it can usually be merged. |
@jokerwyt This part was implemented by @merrymercy and @ispobock. I have added them to the review list. |
…into dp-port-dispatch
@merrymercy @ispobock @zhyncs |
Motivation
When we enable DP attention on many gpus (for example, 64) , the number of ports on node 0 we need is equal to the DP size. In many cases we need to share port space with others (such as container with hostnetwork, or baremetal), the possibility of port conflict is quite high.
Modifications
We get some free ports on node 0 and broadcast them to other nodes using
dist_init_addr
before assigning the zmq port from the DP controller to the scheduler withattn_tp_rank=0
. We also move the port binding next toget_free_port
to reduce the possibility of port conflict.Test
Call for more tests on different settings, especially non-PD disaggregated settings.
Checklist