Skip to content

Optimize lxu_cache_lookup. #952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jasonjk-park
Copy link
Contributor

Summary:
Optimize lxu_cache_lookup kernel by doing:
(1) write coalescing
(2) early return
35% improvement when UVM cache is fully used, and almost 10x improvement when UVM cache is partially used.

Differential Revision: D34491095

jasonjk-park and others added 3 commits February 25, 2022 18:03
Differential Revision: D34353556

fbshipit-source-id: e61a36d0affba2d0dc976a28ed3fe03859cf2885
Differential Revision: D34491026

fbshipit-source-id: ea68c8f1cbb28681d743545ceab0e3aada39a852
Summary:
Optimize lxu_cache_lookup kernel by doing:
(1) write coalescing
(2) early return
35% improvement when UVM cache is fully used, and almost 10x improvement when UVM cache is partially used.

Differential Revision: D34491095

fbshipit-source-id: 87faaf347883823c728b27f3dd54c19cc7570cc0
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D34491095

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by abc7b74.

q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 10, 2025
Summary:
X-link: pytorch#3862

Pull Request resolved: facebookresearch/FBGEMM#952

Applies the same optimization used in D71510967 to cutlass fp8 grouped gemm. This should help performance for cases where G > M.

Reviewed By: jiawenliu64

Differential Revision: D71582782

fbshipit-source-id: 05a86398164b1a4bd6af46e9af2ec7f5faabdeb0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants