-
Notifications
You must be signed in to change notification settings - Fork 610
pass in kernel tbe id into rocksdb wrapper #2930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for pytorch-fbgemm-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
This pull request was exported from Phabricator. Differential Revision: D60635718 |
e843481
to
dda9d7e
Compare
Summary: X-link: facebookresearch/FBGEMM#32 Pull Request resolved: pytorch#2930 the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host. This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port. Reviewed By: sryap Differential Revision: D60635718
This pull request was exported from Phabricator. Differential Revision: D60635718 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D60635718 |
Summary: X-link: facebookresearch/FBGEMM#32 Pull Request resolved: pytorch#2930 the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host. This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port. Reviewed By: sryap Differential Revision: D60635718
dda9d7e
to
c2f948d
Compare
This pull request was exported from Phabricator. Differential Revision: D60635718 |
Summary: X-link: facebookresearch/FBGEMM#32 Pull Request resolved: pytorch#2930 the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host. This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port. Reviewed By: sryap Differential Revision: D60635718
c2f948d
to
c7c1971
Compare
Summary: X-link: facebookresearch/FBGEMM#32 Pull Request resolved: pytorch#2930 the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host. This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port. Reviewed By: sryap Differential Revision: D60635718
This pull request was exported from Phabricator. Differential Revision: D60635718 |
c7c1971
to
639a2f7
Compare
This pull request has been merged in 6607072. |
Summary: Pull Request resolved: facebookresearch/FBGEMM#32 X-link: pytorch#2930 the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host. This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port. Reviewed By: sryap Differential Revision: D60635718 fbshipit-source-id: 606216a4a2d5a43f82f7bd681477537413bd372a
Summary:
the reason we need this is we constantly see the port conflict error in rocksdb initialization. Before this diff we call getFreePort to ge an available port. For each ssd tbe we will create 32 rocksdb shards, so in total there are 256 ports needed per host.
This works fine with 4 hosts until we are running 16 hosts training job as we need make sure all 16 hosts don't get into the corner cases where multiple db shard get assigned the same free port.
Differential Revision: D60635718