Skip to content

Adding nvtx memory regions to pool MR #1952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

nirandaperera
Copy link

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Signed-off-by: niranda perera <[email protected]>
Signed-off-by: niranda perera <[email protected]>
@@ -217,7 +217,7 @@ class stream_ordered_memory_resource : public crtp<PoolResource>, public device_
std::string("Maximum allocation size exceeded (failed to allocate ") +
rmm::detail::format_bytes(size) + ")",
rmm::out_of_memory);
auto const block = this->underlying().get_block(size, stream_event);
auto const block = get_block(size, stream_event);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ question: ‏ Why drop the CRTP indirection here? This doesn't seem related to this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism that's right. But when I was reading the code, what I gathered was, get_block is not implemented by the derived class. It's not mentioned here as well. https://github.com/nirandaperera/rmm/blob/adding_nvtx_pool/cpp/include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp#L70-L76
So, IINM, we can simply call the method, without the indirection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Good catch.

@@ -383,6 +390,24 @@ class pool_memory_resource final
allocated_blocks_.insert(alloc);
#endif

#ifdef RMM_NVTX
void* heap_key;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this adds some overhead on every suballocation. And the insertion into the nvtx_heaps map is a small overhead on upstream allocations.

Can you please benchmark this cost with the random allocations benchmark with NVTX on and off and report it in the PR? Is NVTX enabled by default? Depending on these costs, we may want it off by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harrism Yes, there is an overhead here. In particular

  1. Inserting and querying from the nvtx_heaps_ unordered map.
  2. calling lower_bound on unstream_blocks_ set (which is logarithmic)

I think we can alleviate 2, if we add a void* upstream_ member to the block class, rather than the bool head. Then IINM, is_head() will be upstream_ == ptr_. But then, we are adding additional 3-bytes to the block class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think its a worthwhile change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see benchmarks, if you don't mind. :)

@harrism harrism added non-breaking Non-breaking change feature request New feature or request labels Jun 11, 2025
@nirandaperera nirandaperera requested a review from a team as a code owner June 12, 2025 00:02
@github-actions github-actions bot added the CMake label Jun 12, 2025
@nirandaperera nirandaperera marked this pull request as draft June 12, 2025 00:03
Copy link

copy-pr-bot bot commented Jun 12, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CMake feature request New feature or request non-breaking Non-breaking change
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants