cuda : fix bounds check for src0 rows in MMVQ kernel #2231

ggerganov · 2024-06-11T08:37:51Z

The non-EN models have a tensor with odd number of rows:

decoder.token_embedding.weight - [ 1024, 51865,     1]

With BS > 1 and since d7e9f58 we have been doing out-of-bound writes with quantum models because rows_per_cuda_block can be equal to 2, leading to significantly poor transcription quality:

https://github.com/ggerganov/whisper.cpp/blob/20c542c71334b3e6c422789093a19157b110ca81/ggml-cuda/mmvq.cu#L92-L123

Repro on RTX 2060:

# any quantum model would reproduce the problem
rm models/ggml-base-q5_1.bin
./models/download-ggml-model.sh base-q5_1

make clean
WHISPER_CUDA=1 make -j && ./main -m models/ggml-base-q5_1.bin -f samples/gb0.wav

ggerganov · 2024-06-11T08:38:19Z

cc @JohannesGaessler for review

JohannesGaessler · 2024-06-11T09:27:32Z

This is a more general issue (also for llama.cpp) and I think there's a better way to fix it. A PR to which repository would be the least amount of work for you to sync?

JohannesGaessler

The performance difference from this change (for llama.cpp) is negligible and for batch size 1 you can optimize out the check anyways. So I think it's fine to just fix it like this. Note that these out-of-bounds writes are potentially also causing issues for llama.cpp if whoever finetuned the model added an extra token to the vocabulary.

ggml-cuda/mmvq.cu

Co-authored-by: Johannes Gäßler <[email protected]>

ggerganov · 2024-06-11T14:38:11Z

Thanks, I'll merge it here and will sync back to llama.cpp soon

* cuda : fix bounds check for src0 rows in MMVQ kernel * Update ggml-cuda/mmvq.cu Co-authored-by: Johannes Gäßler <[email protected]> --------- Co-authored-by: Johannes Gäßler <[email protected]>

cuda : fix bounds check for src0 rows in MMVQ kernel

9df6298

ggerganov mentioned this pull request Jun 11, 2024

Segmentation fault with CUDA on commit (#2206) #2230

Closed

JohannesGaessler approved these changes Jun 11, 2024

View reviewed changes

ggml-cuda/mmvq.cu Outdated Show resolved Hide resolved

Update ggml-cuda/mmvq.cu

13c5446

Co-authored-by: Johannes Gäßler <[email protected]>

ggerganov merged commit 99804b0 into master Jun 11, 2024
91 of 94 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cuda : fix bounds check for src0 rows in MMVQ kernel #2231

cuda : fix bounds check for src0 rows in MMVQ kernel #2231

Uh oh!

ggerganov commented Jun 11, 2024

Uh oh!

ggerganov commented Jun 11, 2024

Uh oh!

JohannesGaessler commented Jun 11, 2024

Uh oh!

JohannesGaessler left a comment

Uh oh!

Uh oh!

ggerganov commented Jun 11, 2024

Uh oh!

Uh oh!

Uh oh!

cuda : fix bounds check for src0 rows in MMVQ kernel #2231

cuda : fix bounds check for src0 rows in MMVQ kernel #2231

Uh oh!

Conversation

ggerganov commented Jun 11, 2024

Uh oh!

ggerganov commented Jun 11, 2024

Uh oh!

JohannesGaessler commented Jun 11, 2024

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented Jun 11, 2024

Uh oh!

Uh oh!

Uh oh!