-
Notifications
You must be signed in to change notification settings - Fork 223
Adding nvtx memory regions to pool MR #1952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
@@ -217,7 +217,7 @@ class stream_ordered_memory_resource : public crtp<PoolResource>, public device_ | |||
std::string("Maximum allocation size exceeded (failed to allocate ") + | |||
rmm::detail::format_bytes(size) + ")", | |||
rmm::out_of_memory); | |||
auto const block = this->underlying().get_block(size, stream_event); | |||
auto const block = get_block(size, stream_event); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ question: Why drop the CRTP indirection here? This doesn't seem related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harrism that's right. But when I was reading the code, what I gathered was, get_block
is not implemented by the derived class. It's not mentioned here as well. https://github.com/nirandaperera/rmm/blob/adding_nvtx_pool/cpp/include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp#L70-L76
So, IINM, we can simply call the method, without the indirection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see. Good catch.
@@ -383,6 +390,24 @@ class pool_memory_resource final | |||
allocated_blocks_.insert(alloc); | |||
#endif | |||
|
|||
#ifdef RMM_NVTX | |||
void* heap_key; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this adds some overhead on every suballocation. And the insertion into the nvtx_heaps map is a small overhead on upstream allocations.
Can you please benchmark this cost with the random allocations benchmark with NVTX on and off and report it in the PR? Is NVTX enabled by default? Depending on these costs, we may want it off by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harrism Yes, there is an overhead here. In particular
- Inserting and querying from the
nvtx_heaps_
unordered map. - calling
lower_bound
onunstream_blocks_
set (which is logarithmic)
I think we can alleviate 2, if we add a void* upstream_
member to the block
class, rather than the bool head
. Then IINM, is_head()
will be upstream_ == ptr_
. But then, we are adding additional 3-bytes to the block
class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think its a worthwhile change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see benchmarks, if you don't mind. :)
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Description
Checklist