-
Notifications
You must be signed in to change notification settings - Fork 149
Add support more NVIDIA devices #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Did you test that each arch/device should be routed to which implementation in Maybe we should eventually implement something like autotune to do this |
I'm testing with Ada, Hopper (gh200) and jetson orin and rtx5090/gb200 |
For blackwell is the same as rtx50 with triton 3.3.x |
Isn't there more needed to handle the B200? The commit seems to only get past the setup process. For example, the sm100 for B200 is not a case handled in the core.py (it skips to sm120). Line 135 in core.py
Otherwise seems to throw this error:
Copying one of the other cases doesn't seem to be enough:
Still results in:
|
@pftq Ok let's do it. As you have a B200, you can test which of the implementations is the fastest (maybe using the scripts at https://github.com/thu-ml/SageAttention/tree/main/bench ) and route it in the function |
See the bottom of my earlier reply - right now just getting a CUDA error so I'd need to get past that first. |
Support:
Jetson Orin: 8.7
Jetson Thor: 10.1
Blackwell B100/B200/GB200: 10.0
Spark: 11.0