Add support more NVIDIA devices #160

johnnynunez · 2025-04-22T11:30:44Z

Support:

Jetson Orin: 8.7
Jetson Thor: 10.1
Blackwell B100/B200/GB200: 10.0
Spark: 11.0

woct0rdho · 2025-04-22T13:37:13Z

Did you test that each arch/device should be routed to which implementation in sageattn in core.py?

Maybe we should eventually implement something like autotune to do this

johnnynunez · 2025-04-22T15:38:02Z

Did you test that each arch/device should be routed to which implementation in sageattn in core.py?

Maybe we should eventually implement something like autotune to do this

I'm testing with Ada, Hopper (gh200) and jetson orin and rtx5090/gb200

johnnynunez · 2025-04-22T15:39:40Z

For blackwell is the same as rtx50 with triton 3.3.x

pftq · 2025-05-17T01:16:09Z

Isn't there more needed to handle the B200? The commit seems to only get past the setup process. For example, the sm100 for B200 is not a case handled in the core.py (it skips to sm120).

Line 135 in core.py

    elif arch == "sm90":
        return sageattn_qk_int8_pv_fp8_cuda_sm90(q, k, v, tensor_layout=tensor_layout, is_causal=is_causal, sm_scale=sm_scale, return_lse=return_lse, pv_accum_dtype="fp32+fp32")
    elif arch == "sm120":
        return sageattn_qk_int8_pv_fp8_cuda(q, k, v, tensor_layout=tensor_layout, is_causal=is_causal, qk_quant_gran="per_warp", sm_scale=sm_scale, return_lse=return_lse, pv_accum_dtype="fp32") # sm120 has accurate fp32 accumulator for fp8 mma and triton kernel is currently not usable on sm120.

Otherwise seems to throw this error:

  File "/workspace/ComfyUI/venv/lib/python3.11/site-packages/sageattention/core.py", line 138, in sageattn
    raise ValueError(f"Unsupported CUDA architecture: {arch}")
ValueError: Unsupported CUDA architecture: sm100

Copying one of the other cases doesn't seem to be enough:

 elif arch == "sm100":
        return sageattn_qk_int8_pv_fp8_cuda(q, k, v, tensor_layout=tensor_layout, is_causal=is_causal, qk_quant_gran="per_warp", sm_scale=sm_scale, return_lse=return_lse, pv_accum_dtype="fp32")

Still results in:

  File "/workspace/ComfyUI/venv/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 857, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workspace/ComfyUI/venv/lib/python3.11/site-packages/sageattention/core.py", line 722, in sageattn_qk_int8_pv_fp8_cuda
    o = torch.empty(q.size(), dtype=dtype, device=q.device)
RuntimeError: CUDA error: no kernel image is available for execution on the device

woct0rdho · 2025-05-17T01:27:40Z

@pftq Ok let's do it. As you have a B200, you can test which of the implementations is the fastest (maybe using the scripts at https://github.com/thu-ml/SageAttention/tree/main/bench ) and route it in the function sageattn.

pftq · 2025-05-17T01:34:30Z

See the bottom of my earlier reply - right now just getting a CUDA error so I'd need to get past that first.

Add support more devices

8298d49

johnnynunez mentioned this pull request Apr 22, 2025

expose TORCH_CUDA_ARCH_LIST #147

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support more NVIDIA devices #160

Add support more NVIDIA devices #160

Uh oh!

johnnynunez commented Apr 22, 2025 •

edited

Loading

Uh oh!

woct0rdho commented Apr 22, 2025

Uh oh!

johnnynunez commented Apr 22, 2025 •

edited

Loading

Uh oh!

johnnynunez commented Apr 22, 2025

Uh oh!

pftq commented May 17, 2025 •

edited

Loading

Uh oh!

woct0rdho commented May 17, 2025 •

edited

Loading

Uh oh!

pftq commented May 17, 2025

Uh oh!

Uh oh!

Add support more NVIDIA devices #160

Are you sure you want to change the base?

Add support more NVIDIA devices #160

Uh oh!

Conversation

johnnynunez commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woct0rdho commented Apr 22, 2025

Uh oh!

johnnynunez commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

johnnynunez commented Apr 22, 2025

Uh oh!

pftq commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

woct0rdho commented May 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pftq commented May 17, 2025

Uh oh!

Uh oh!

johnnynunez commented Apr 22, 2025 •

edited

Loading

johnnynunez commented Apr 22, 2025 •

edited

Loading

pftq commented May 17, 2025 •

edited

Loading

woct0rdho commented May 17, 2025 •

edited

Loading