-
Notifications
You must be signed in to change notification settings - Fork 187
sha2 crate = runtime error #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @brandonros , sha2 has a lot of optimizations on the CPU, i.e. AVX2 etc. so these kinds of crates are not available for direct use in CUDA kernel. If we would like to directly use these kinds of crates in the CUDA kernel, we need to write an implementation in these crates and gate by a CUDA alike feature flag. |
I would have guessed the compiler would be able to tell AVX2 was not available and not try to include them. I believe even with this non-AVX2 implementation (soft) the issue still occurs: https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/soft.rs https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512.rs#L2-L4 Any suggestions on how to debug exactly what the problem is or tell the compiler those options aren't available? Are you saying |
The error shown here happens at runtime, so I'm assuming that your GPU crate compiled successfully with the nvvm codegen. If so, then there shouldn't be an issue with the use of the SHA crate. You are however using cudarc, which is a different crate then the one we maintain here and is where the error originates in. While in theory, these should be identical bindings to the cuda driver api, and the ptx generated should be loadable by any program that can load and launch kernels, I've only every used the bindings provided through cust to launch kernels compiled by the nvvm backend. If this issue is in how cudarc launches the kernel, then it might be better to open an issue with them so they can help pinpoint why the kernel is failing to launch, and if it has something to do with the ptx generated from our backend. If you have the same issue launching the kernel with cust, I can look further. |
cudarc replaced with cust: brandonros/ed25519-vanity-rs@2b04c7e _compact functions work (sha2), non-compact do not PTX:
|
Thanks for the extra context. I'll look further. |
Thanks again for all the reports. I won't have as much availability to look into this as I thought but will asap. Or someone else can feel free to look further. |
give me your 30 second take on this please, something I can dive in and try to help look into. I get that the crate has a bunch of different backends, trying to use AVX and SIMD where present, but I would think it knows to fallback to not do that if possible. |
I had glanced at the sha2 code, and it does indeed fall back to soft. I'm a bit busy at the moment but will probably start poking at this in an hour 👍 |
@brandonros I am not seeing this error, it seems to work? https://github.com/LegNeato/ed25519-vanity-rs/. Did I screw something up? |
https://docs.rs/sha2/latest/sha2/?search=output
Are you sure that compiles? That |
Try this fn sha512_hash(input: &[u8]) -> [u8; 64] {
use sha2::{Digest, Sha512};
let mut hasher = Sha512::new();
hasher.update(input);
hasher.finalize().into()
} |
Ugh, I don't understand why it isn't always using the latest code, sometimes it fails and continues to run the previous binary, making me look like an idiot 😅 . I can repro now. Going to bed, will look tomorrow. |
I believe even with
https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512.rs#L5 https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/soft_compact.rs this is still an issue I would not have thought that without your Should I try a different/newer LLVM/nightly version? How tightly coupled are those two? |
I think I caught it:
Rust-CUDA/crates/rustc_codegen_nvvm/src/ty.rs Line 236 in afb147e
|
Ok, I got this much further along. Now it is generating invalid bitcode:
|
tracking your work here: https://github.com/LegNeato/Rust-CUDA/tree/fixsha2 i will try to dive in this weekend and catch up. wondering if it is (as a bad guess, sorry) the "we pin ourselves to a specific version of nightly and llvm v7" generating something bad? |
Yeah, llvm-dis-7 shows the error and won't disassemble but llvm-dis-18 or whatever disassembles it. But I don't know enough about llvm bitcode and rust internals to know what is going on. |
if nvptx64 is an official supported rust triple outside of your project, why do we need to be on an old nightly with an old llvm? |
Because nvvm is based on an old llvm. See also #197 |
maybe let's table this one (sha2 working on old llvm) and instead work together to bring llvm18 support to the crate which would fix this is that even possible, given what you are saying and the linked issue? if nvvm is based on llvm7, is it even possible to upgrade? |
Check out the issue, yeah it is not a good idea to move forward. It would break compatibility with virtually every card out there except the newest ones. |
let's put it behind a flag? i'd be down to try it/help. i've been playing with this in the mean time: [build]
target = "nvptx64-nvidia-cuda"
[target.nvptx64-nvidia-cuda]
linker = "true"
rustflags = ["--emit=llvm-ir"] #!/bin/bash
set -e
# Emit LLVM IR
cargo build --release
# Link dependencies first, then main crate
llvm-link-20 \
target/nvptx64-nvidia-cuda/release/deps/hex-*.ll \
target/nvptx64-nvidia-cuda/release/deps/cuda_adder-*.ll \
-o combined.ll
# Convert to PTX
llc-20 combined.ll -march=nvptx64 -mcpu=sm_100 -o output.ptx Would you expect the output to suck performance wise (since it is LLVM IR -> PTX and not NVVM)? edit: this helps https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/faq.md#why-not-use-rustc-with-the-llvm-ptx-backend |
https://github.com/RustCrypto/hashes/blob/master/sha2/Cargo.toml vs https://github.com/brandonros/rust-ed25519-compact
The text was updated successfully, but these errors were encountered: