Skip to content

sha2 crate = runtime error #207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brandonros opened this issue Apr 27, 2025 · 22 comments
Open

sha2 crate = runtime error #207

brandonros opened this issue Apr 27, 2025 · 22 comments

Comments

@brandonros
Copy link

https://github.com/RustCrypto/hashes/blob/master/sha2/Cargo.toml vs https://github.com/brandonros/rust-ed25519-compact

$ cargo run --release -- aa $BLOCKS_PER_GRID $THREADS_PER_BLOCK
   Compiling ed25519_vanity v0.1.0 (/home/brandon/ed25519-vanity-rs)
    Finished `release` profile [optimized] target(s) in 0.90s
     Running `target/release/ed25519_vanity aa 128 128`
Found 1 CUDA devices
Starting device 0
[0] Loading module...
[0] Starting search loop...

thread '<unnamed>' panicked at /home/brandon/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cudarc-0.16.0/src/driver/safe/core.rs:470:36:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread '<unnamed>' panicked at /home/brandon/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cudarc-0.16.0/src/driver/safe/core.rs:246:58:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
stack backtrace:
   0:     0x55e313ca18e3 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::hdbd106d724e72c20
   1:     0x55e313cc3493 - core::fmt::write::h861eecc74abebf7a
   2:     0x55e313c9f003 - std::io::Write::write_fmt::h493b3152b071fba0
   3:     0x55e313ca1732 - std::sys::backtrace::BacktraceLock::print::h71f315c25fc266cb
   4:     0x55e313ca26ca - std::panicking::default_hook::{{closure}}::h8019dc6a2c6c0fe7
   5:     0x55e313ca253a - std::panicking::default_hook::h497f769686a88dd6
   6:     0x55e313ca2fd2 - std::panicking::rust_panic_with_hook::h98fc165e90ef379e
   7:     0x55e313ca2e6a - std::panicking::begin_panic_handler::{{closure}}::h2c1a60d0a908eaec
   8:     0x55e313ca1dd9 - std::sys::backtrace::__rust_end_short_backtrace::he8aba8f9b7ddf304
   9:     0x55e313ca2afd - rust_begin_unwind
  10:     0x55e313cc2010 - core::panicking::panic_fmt::hcbf39f8c1e585f84
  11:     0x55e313cc23a6 - core::result::unwrap_failed::haf1491c6d679786d
  12:     0x55e313c7bc98 - <cudarc::driver::safe::core::CudaEvent as core::ops::drop::Drop>::drop::heca398ef781d7d06
  13:     0x55e313c60323 - core::ptr::drop_in_place<core::option::Option<cudarc::driver::safe::core::CudaEvent>>::h0f0fd586ce97c35f
  14:     0x55e313c601f6 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<u8>>::he9b6484b4f5db6c9
  15:     0x55e313c61cae - ed25519_vanity::device_main::h99b65f8ab2f8c263
  16:     0x55e313c65f2b - std::sys::backtrace::__rust_begin_short_backtrace::he6efac01710238fd
  17:     0x55e313c656f1 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h486bd94640b67ff5
  18:     0x55e313ca4d8b - std::sys::pal::unix::thread::Thread::new::thread_start::h20288ab9ea215a81
  19:     0x7fe278c381f5 - <unknown>
  20:     0x7fe278cb889c - <unknown>
  21:                0x0 - <unknown>

thread '<unnamed>' panicked at library/core/src/panicking.rs:226:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted
use sha2::Digest as _;
use ed25519_compact::ge_scalarmult_base;
use rand_core::{SeedableRng, RngCore};
use rand_xorshift::XorShiftRng;
use bs58;

// fails
fn sha512(input: &[u8]) -> [u8; 64] {
    let mut hasher = sha2::Sha512::new();
    hasher.update(input);
    hasher.finalize().into()
}

// works
fn sha512_compact(input: &[u8]) -> [u8; 64] {
    let mut hasher = ed25519_compact::sha512::Hash::new();
    hasher.update(input);
    hasher.finalize()
}
@adamcavendish
Copy link
Contributor

Hi @brandonros , sha2 has a lot of optimizations on the CPU, i.e. AVX2 etc. so these kinds of crates are not available for direct use in CUDA kernel. If we would like to directly use these kinds of crates in the CUDA kernel, we need to write an implementation in these crates and gate by a CUDA alike feature flag.

@brandonros
Copy link
Author

I would have guessed the compiler would be able to tell AVX2 was not available and not try to include them.

I believe even with this non-AVX2 implementation (soft) the issue still occurs: https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/soft.rs

https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512.rs#L2-L4

Any suggestions on how to debug exactly what the problem is or tell the compiler those options aren't available? Are you saying host CPU features are accidentally used when compiling with the CUDA GPU compiler?

@jorge-ortega
Copy link
Collaborator

The error shown here happens at runtime, so I'm assuming that your GPU crate compiled successfully with the nvvm codegen. If so, then there shouldn't be an issue with the use of the SHA crate. You are however using cudarc, which is a different crate then the one we maintain here and is where the error originates in. While in theory, these should be identical bindings to the cuda driver api, and the ptx generated should be loadable by any program that can load and launch kernels, I've only every used the bindings provided through cust to launch kernels compiled by the nvvm backend. If this issue is in how cudarc launches the kernel, then it might be better to open an issue with them so they can help pinpoint why the kernel is failing to launch, and if it has something to do with the ptx generated from our backend. If you have the same issue launching the kernel with cust, I can look further.

@brandonros
Copy link
Author

cudarc replaced with cust: brandonros/ed25519-vanity-rs@2b04c7e

_compact functions work (sha2), non-compact do not

PTX:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-35059454
// Cuda compilation tools, release 12.6, V12.6.85
// Based on NVVM 7.0.1
//

.version 8.5
.target sm_61
.address_size 64

        // .globl       find_vanity_private_key

.visible .entry find_vanity_private_key(
        .param .u64 find_vanity_private_key_param_0,
        .param .u64 find_vanity_private_key_param_1,
        .param .u64 find_vanity_private_key_param_2,
        .param .u64 find_vanity_private_key_param_3,
        .param .u64 find_vanity_private_key_param_4,
        .param .u64 find_vanity_private_key_param_5,
        .param .u64 find_vanity_private_key_param_6
)
{



        bar.sync        0;
        bar.sync        0;
        bar.sync        0;
        trap;

}

@jorge-ortega
Copy link
Collaborator

Thanks for the extra context. I'll look further.

@jorge-ortega
Copy link
Collaborator

Thanks again for all the reports. I won't have as much availability to look into this as I thought but will asap. Or someone else can feel free to look further.

@brandonros
Copy link
Author

cudarc replaced with cust: brandonros/ed25519-vanity-rs@2b04c7e

_compact functions work (sha2), non-compact do not

PTX:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-35059454
// Cuda compilation tools, release 12.6, V12.6.85
// Based on NVVM 7.0.1
//

.version 8.5
.target sm_61
.address_size 64

        // .globl       find_vanity_private_key

.visible .entry find_vanity_private_key(
        .param .u64 find_vanity_private_key_param_0,
        .param .u64 find_vanity_private_key_param_1,
        .param .u64 find_vanity_private_key_param_2,
        .param .u64 find_vanity_private_key_param_3,
        .param .u64 find_vanity_private_key_param_4,
        .param .u64 find_vanity_private_key_param_5,
        .param .u64 find_vanity_private_key_param_6
)
{



        bar.sync        0;
        bar.sync        0;
        bar.sync        0;
        trap;

}

@LegNeato

give me your 30 second take on this please, something I can dive in and try to help look into.

I get that the crate has a bunch of different backends, trying to use AVX and SIMD where present, but I would think it knows to fallback to not do that if possible.

@LegNeato
Copy link
Contributor

I had glanced at the sha2 code, and it does indeed fall back to soft.

I'm a bit busy at the moment but will probably start poking at this in an hour 👍

@LegNeato
Copy link
Contributor

@brandonros I am not seeing this error, it seems to work? https://github.com/LegNeato/ed25519-vanity-rs/. Did I screw something up?

@brandonros
Copy link
Author

@LegNeato

https://docs.rs/sha2/latest/sha2/?search=output

  error[E0432]: unresolved import `sha2::Output`
   --> src/lib.rs:5:20
    |
  5 | use sha2::{Digest, Output, Sha512};
    |                    ^^^^^^ no `Output` in the root

Are you sure that compiles? That vast script will silently fail to compile but run the oldest one. it does a (not smart) cargo build and then ./target/ run separate. you could make it cargo run if you want

@brandonros
Copy link
Author

Try this

fn sha512_hash(input: &[u8]) -> [u8; 64] {
    use sha2::{Digest, Sha512};
    let mut hasher = Sha512::new();
    hasher.update(input);
    hasher.finalize().into()
}

@LegNeato
Copy link
Contributor

LegNeato commented May 27, 2025

Ugh, I don't understand why it isn't always using the latest code, sometimes it fails and continues to run the previous binary, making me look like an idiot 😅 . I can repro now. Going to bed, will look tomorrow.

@brandonros
Copy link
Author

I believe even with --cfg sha2_backend="soft-compact"

env::set_var("RUSTFLAGS", "--cfg sha2_backend=\"soft-compact\"");

https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512.rs#L5

https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/soft_compact.rs

this is still an issue

@LegNeato

https://github.com/RustCrypto/hashes/blob/f76b69640c481029e377bee7881f987ed0b888f7/sha2/src/sha512/soft.rs#L35

I would not have thought that without your read_volatile potential fix landed that the PTX would be so corrupt it wouldn't even compile

Should I try a different/newer LLVM/nightly version? How tightly coupled are those two?

@LegNeato
Copy link
Contributor

I think I caught it:

  thread 'rustc' panicked at /root/.cargo/git/checkouts/rust-cuda-97fe623ab90e914e/afb147e/crates/rustc_codegen_nvvm/src/ty.rs:236:18:
  Box<dyn Any>

_ => bug!("llvm_float_width called on a non-float type"),

@LegNeato
Copy link
Contributor

LegNeato commented Jun 2, 2025

Ok, I got this much further along. Now it is generating invalid bitcode:

.expect("Failed to parse module bitcode");

@brandonros
Copy link
Author

tracking your work here: https://github.com/LegNeato/Rust-CUDA/tree/fixsha2

i will try to dive in this weekend and catch up. wondering if it is (as a bad guess, sorry) the "we pin ourselves to a specific version of nightly and llvm v7" generating something bad?

@LegNeato
Copy link
Contributor

LegNeato commented Jun 7, 2025

Yeah, llvm-dis-7 shows the error and won't disassemble but llvm-dis-18 or whatever disassembles it. But I don't know enough about llvm bitcode and rust internals to know what is going on.

@brandonros
Copy link
Author

if nvptx64 is an official supported rust triple outside of your project, why do we need to be on an old nightly with an old llvm?

@LegNeato
Copy link
Contributor

LegNeato commented Jun 7, 2025

Because nvvm is based on an old llvm. See also #197

@brandonros
Copy link
Author

maybe let's table this one (sha2 working on old llvm) and instead work together to bring llvm18 support to the crate which would fix this

is that even possible, given what you are saying and the linked issue?

if nvvm is based on llvm7, is it even possible to upgrade?

@LegNeato
Copy link
Contributor

LegNeato commented Jun 8, 2025

Check out the issue, yeah it is not a good idea to move forward. It would break compatibility with virtually every card out there except the newest ones.

@brandonros
Copy link
Author

brandonros commented Jun 8, 2025

let's put it behind a flag? i'd be down to try it/help. i've been playing with this in the mean time:

[build]
target = "nvptx64-nvidia-cuda"

[target.nvptx64-nvidia-cuda]
linker = "true"
rustflags = ["--emit=llvm-ir"]
#!/bin/bash

set -e

# Emit LLVM IR
cargo build --release

# Link dependencies first, then main crate
llvm-link-20 \
  target/nvptx64-nvidia-cuda/release/deps/hex-*.ll \
  target/nvptx64-nvidia-cuda/release/deps/cuda_adder-*.ll \
  -o combined.ll

# Convert to PTX
llc-20 combined.ll -march=nvptx64 -mcpu=sm_100 -o output.ptx

Would you expect the output to suck performance wise (since it is LLVM IR -> PTX and not NVVM)?

edit: this helps https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/faq.md#why-not-use-rustc-with-the-llvm-ptx-backend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants