sha2 crate = runtime error #207

brandonros · 2025-04-27T17:02:17Z

https://github.com/RustCrypto/hashes/blob/master/sha2/Cargo.toml vs https://github.com/brandonros/rust-ed25519-compact

$ cargo run --release -- aa $BLOCKS_PER_GRID $THREADS_PER_BLOCK
   Compiling ed25519_vanity v0.1.0 (/home/brandon/ed25519-vanity-rs)
    Finished `release` profile [optimized] target(s) in 0.90s
     Running `target/release/ed25519_vanity aa 128 128`
Found 1 CUDA devices
Starting device 0
[0] Loading module...
[0] Starting search loop...

thread '<unnamed>' panicked at /home/brandon/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cudarc-0.16.0/src/driver/safe/core.rs:470:36:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread '<unnamed>' panicked at /home/brandon/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/cudarc-0.16.0/src/driver/safe/core.rs:246:58:
called `Result::unwrap()` on an `Err` value: DriverError(CUDA_ERROR_LAUNCH_FAILED, "unspecified launch failure")
stack backtrace:
   0:     0x55e313ca18e3 - <std::sys::backtrace::BacktraceLock::print::DisplayBacktrace as core::fmt::Display>::fmt::hdbd106d724e72c20
   1:     0x55e313cc3493 - core::fmt::write::h861eecc74abebf7a
   2:     0x55e313c9f003 - std::io::Write::write_fmt::h493b3152b071fba0
   3:     0x55e313ca1732 - std::sys::backtrace::BacktraceLock::print::h71f315c25fc266cb
   4:     0x55e313ca26ca - std::panicking::default_hook::{{closure}}::h8019dc6a2c6c0fe7
   5:     0x55e313ca253a - std::panicking::default_hook::h497f769686a88dd6
   6:     0x55e313ca2fd2 - std::panicking::rust_panic_with_hook::h98fc165e90ef379e
   7:     0x55e313ca2e6a - std::panicking::begin_panic_handler::{{closure}}::h2c1a60d0a908eaec
   8:     0x55e313ca1dd9 - std::sys::backtrace::__rust_end_short_backtrace::he8aba8f9b7ddf304
   9:     0x55e313ca2afd - rust_begin_unwind
  10:     0x55e313cc2010 - core::panicking::panic_fmt::hcbf39f8c1e585f84
  11:     0x55e313cc23a6 - core::result::unwrap_failed::haf1491c6d679786d
  12:     0x55e313c7bc98 - <cudarc::driver::safe::core::CudaEvent as core::ops::drop::Drop>::drop::heca398ef781d7d06
  13:     0x55e313c60323 - core::ptr::drop_in_place<core::option::Option<cudarc::driver::safe::core::CudaEvent>>::h0f0fd586ce97c35f
  14:     0x55e313c601f6 - core::ptr::drop_in_place<cudarc::driver::safe::core::CudaSlice<u8>>::he9b6484b4f5db6c9
  15:     0x55e313c61cae - ed25519_vanity::device_main::h99b65f8ab2f8c263
  16:     0x55e313c65f2b - std::sys::backtrace::__rust_begin_short_backtrace::he6efac01710238fd
  17:     0x55e313c656f1 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h486bd94640b67ff5
  18:     0x55e313ca4d8b - std::sys::pal::unix::thread::Thread::new::thread_start::h20288ab9ea215a81
  19:     0x7fe278c381f5 - <unknown>
  20:     0x7fe278cb889c - <unknown>
  21:                0x0 - <unknown>

thread '<unnamed>' panicked at library/core/src/panicking.rs:226:5:
panic in a destructor during cleanup
thread caused non-unwinding panic. aborting.
Aborted

use sha2::Digest as _;
use ed25519_compact::ge_scalarmult_base;
use rand_core::{SeedableRng, RngCore};
use rand_xorshift::XorShiftRng;
use bs58;

// fails
fn sha512(input: &[u8]) -> [u8; 64] {
    let mut hasher = sha2::Sha512::new();
    hasher.update(input);
    hasher.finalize().into()
}

// works
fn sha512_compact(input: &[u8]) -> [u8; 64] {
    let mut hasher = ed25519_compact::sha512::Hash::new();
    hasher.update(input);
    hasher.finalize()
}

The text was updated successfully, but these errors were encountered:

adamcavendish · 2025-04-28T07:11:00Z

Hi @brandonros , sha2 has a lot of optimizations on the CPU, i.e. AVX2 etc. so these kinds of crates are not available for direct use in CUDA kernel. If we would like to directly use these kinds of crates in the CUDA kernel, we need to write an implementation in these crates and gate by a CUDA alike feature flag.

brandonros · 2025-04-28T14:14:45Z

I would have guessed the compiler would be able to tell AVX2 was not available and not try to include them.

I believe even with this non-AVX2 implementation (soft) the issue still occurs: https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/soft.rs

https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512.rs#L2-L4

Any suggestions on how to debug exactly what the problem is or tell the compiler those options aren't available? Are you saying host CPU features are accidentally used when compiling with the CUDA GPU compiler?

jorge-ortega · 2025-04-29T00:54:31Z

The error shown here happens at runtime, so I'm assuming that your GPU crate compiled successfully with the nvvm codegen. If so, then there shouldn't be an issue with the use of the SHA crate. You are however using cudarc, which is a different crate then the one we maintain here and is where the error originates in. While in theory, these should be identical bindings to the cuda driver api, and the ptx generated should be loadable by any program that can load and launch kernels, I've only every used the bindings provided through cust to launch kernels compiled by the nvvm backend. If this issue is in how cudarc launches the kernel, then it might be better to open an issue with them so they can help pinpoint why the kernel is failing to launch, and if it has something to do with the ptx generated from our backend. If you have the same issue launching the kernel with cust, I can look further.

brandonros · 2025-04-29T01:35:43Z

cudarc replaced with cust: brandonros/ed25519-vanity-rs@2b04c7e

_compact functions work (sha2), non-compact do not

PTX:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-35059454
// Cuda compilation tools, release 12.6, V12.6.85
// Based on NVVM 7.0.1
//

.version 8.5
.target sm_61
.address_size 64

        // .globl       find_vanity_private_key

.visible .entry find_vanity_private_key(
        .param .u64 find_vanity_private_key_param_0,
        .param .u64 find_vanity_private_key_param_1,
        .param .u64 find_vanity_private_key_param_2,
        .param .u64 find_vanity_private_key_param_3,
        .param .u64 find_vanity_private_key_param_4,
        .param .u64 find_vanity_private_key_param_5,
        .param .u64 find_vanity_private_key_param_6
)
{



        bar.sync        0;
        bar.sync        0;
        bar.sync        0;
        trap;

}

jorge-ortega · 2025-04-29T02:18:21Z

Thanks for the extra context. I'll look further.

jorge-ortega · 2025-04-30T23:16:18Z

Thanks again for all the reports. I won't have as much availability to look into this as I thought but will asap. Or someone else can feel free to look further.

brandonros · 2025-05-26T19:04:13Z

cudarc replaced with cust: brandonros/ed25519-vanity-rs@2b04c7e

_compact functions work (sha2), non-compact do not

PTX:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-35059454
// Cuda compilation tools, release 12.6, V12.6.85
// Based on NVVM 7.0.1
//

.version 8.5
.target sm_61
.address_size 64

        // .globl       find_vanity_private_key

.visible .entry find_vanity_private_key(
        .param .u64 find_vanity_private_key_param_0,
        .param .u64 find_vanity_private_key_param_1,
        .param .u64 find_vanity_private_key_param_2,
        .param .u64 find_vanity_private_key_param_3,
        .param .u64 find_vanity_private_key_param_4,
        .param .u64 find_vanity_private_key_param_5,
        .param .u64 find_vanity_private_key_param_6
)
{



        bar.sync        0;
        bar.sync        0;
        bar.sync        0;
        trap;

}

@LegNeato

give me your 30 second take on this please, something I can dive in and try to help look into.

I get that the crate has a bunch of different backends, trying to use AVX and SIMD where present, but I would think it knows to fallback to not do that if possible.

LegNeato · 2025-05-26T19:23:28Z

I had glanced at the sha2 code, and it does indeed fall back to soft.

I'm a bit busy at the moment but will probably start poking at this in an hour 👍

LegNeato · 2025-05-26T22:42:22Z

@brandonros I am not seeing this error, it seems to work? https://github.com/LegNeato/ed25519-vanity-rs/. Did I screw something up?

brandonros · 2025-05-26T23:08:52Z

@LegNeato

https://docs.rs/sha2/latest/sha2/?search=output

  error[E0432]: unresolved import `sha2::Output`
   --> src/lib.rs:5:20
    |
  5 | use sha2::{Digest, Output, Sha512};
    |                    ^^^^^^ no `Output` in the root

Are you sure that compiles? That vast script will silently fail to compile but run the oldest one. it does a (not smart) cargo build and then ./target/ run separate. you could make it cargo run if you want

brandonros · 2025-05-26T23:10:49Z

Try this

fn sha512_hash(input: &[u8]) -> [u8; 64] {
    use sha2::{Digest, Sha512};
    let mut hasher = Sha512::new();
    hasher.update(input);
    hasher.finalize().into()
}

LegNeato · 2025-05-27T02:37:43Z

Ugh, I don't understand why it isn't always using the latest code, sometimes it fails and continues to run the previous binary, making me look like an idiot 😅 . I can repro now. Going to bed, will look tomorrow.

brandonros · 2025-05-31T12:51:49Z

I believe even with --cfg sha2_backend="soft-compact"

env::set_var("RUSTFLAGS", "--cfg sha2_backend=\"soft-compact\"");

https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512.rs#L5

https://github.com/RustCrypto/hashes/blob/master/sha2/src/sha512/soft_compact.rs

this is still an issue

@LegNeato

https://github.com/RustCrypto/hashes/blob/f76b69640c481029e377bee7881f987ed0b888f7/sha2/src/sha512/soft.rs#L35

I would not have thought that without your read_volatile potential fix landed that the PTX would be so corrupt it wouldn't even compile

Should I try a different/newer LLVM/nightly version? How tightly coupled are those two?

LegNeato · 2025-05-31T19:02:01Z

I think I caught it:

  thread 'rustc' panicked at /root/.cargo/git/checkouts/rust-cuda-97fe623ab90e914e/afb147e/crates/rustc_codegen_nvvm/src/ty.rs:236:18:
  Box<dyn Any>

Rust-CUDA/crates/rustc_codegen_nvvm/src/ty.rs

Line 236 in afb147e

_ => bug!("llvm_float_width called on a non-float type"),

LegNeato · 2025-06-02T18:10:05Z

Ok, I got this much further along. Now it is generating invalid bitcode:

Rust-CUDA/crates/rustc_codegen_nvvm/src/nvvm.rs

Line 171 in afb147e

.expect("Failed to parse module bitcode");

brandonros · 2025-06-06T22:58:16Z

tracking your work here: https://github.com/LegNeato/Rust-CUDA/tree/fixsha2

i will try to dive in this weekend and catch up. wondering if it is (as a bad guess, sorry) the "we pin ourselves to a specific version of nightly and llvm v7" generating something bad?

LegNeato · 2025-06-07T04:15:24Z

Yeah, llvm-dis-7 shows the error and won't disassemble but llvm-dis-18 or whatever disassembles it. But I don't know enough about llvm bitcode and rust internals to know what is going on.

brandonros · 2025-06-07T12:36:21Z

if nvptx64 is an official supported rust triple outside of your project, why do we need to be on an old nightly with an old llvm?

LegNeato · 2025-06-07T13:53:00Z

Because nvvm is based on an old llvm. See also #197

brandonros · 2025-06-07T14:00:15Z

maybe let's table this one (sha2 working on old llvm) and instead work together to bring llvm18 support to the crate which would fix this

is that even possible, given what you are saying and the linked issue?

if nvvm is based on llvm7, is it even possible to upgrade?

LegNeato · 2025-06-08T00:17:12Z

Check out the issue, yeah it is not a good idea to move forward. It would break compatibility with virtually every card out there except the newest ones.

brandonros · 2025-06-08T00:59:32Z

let's put it behind a flag? i'd be down to try it/help. i've been playing with this in the mean time:

[build]
target = "nvptx64-nvidia-cuda"

[target.nvptx64-nvidia-cuda]
linker = "true"
rustflags = ["--emit=llvm-ir"]

#!/bin/bash

set -e

# Emit LLVM IR
cargo build --release

# Link dependencies first, then main crate
llvm-link-20 \
  target/nvptx64-nvidia-cuda/release/deps/hex-*.ll \
  target/nvptx64-nvidia-cuda/release/deps/cuda_adder-*.ll \
  -o combined.ll

# Convert to PTX
llc-20 combined.ll -march=nvptx64 -mcpu=sm_100 -o output.ptx

Would you expect the output to suck performance wise (since it is LLVM IR -> PTX and not NVVM)?

edit: this helps https://github.com/Rust-GPU/Rust-CUDA/blob/main/guide/src/faq.md#why-not-use-rustc-with-the-llvm-ptx-backend

brandonros mentioned this issue Apr 27, 2025

curve25519_dalek crate = runtime error #208

Closed

brandonros mentioned this issue May 26, 2025

Make constant memory opt-in, spill large statics to global memory #217

Merged

brandonros mentioned this issue Jun 8, 2025

llvm 19 support #227

Draft

sha2 crate = runtime error #207

sha2 crate = runtime error #207

Comments

brandonros commented Apr 27, 2025

adamcavendish commented Apr 28, 2025

Uh oh!

brandonros commented Apr 28, 2025

Uh oh!

jorge-ortega commented Apr 29, 2025

Uh oh!

brandonros commented Apr 29, 2025

Uh oh!

jorge-ortega commented Apr 29, 2025

Uh oh!

jorge-ortega commented Apr 30, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

LegNeato commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

brandonros commented May 26, 2025

Uh oh!

LegNeato commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandonros commented May 31, 2025

Uh oh!

LegNeato commented May 31, 2025

Uh oh!

LegNeato commented Jun 2, 2025

Uh oh!

brandonros commented Jun 6, 2025

Uh oh!

LegNeato commented Jun 7, 2025

Uh oh!

brandonros commented Jun 7, 2025

Uh oh!

LegNeato commented Jun 7, 2025

Uh oh!

brandonros commented Jun 7, 2025

Uh oh!

LegNeato commented Jun 8, 2025

Uh oh!

brandonros commented Jun 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LegNeato commented May 27, 2025 •

edited

Loading

brandonros commented Jun 8, 2025 •

edited

Loading