cuda : fix bounds check for src0 rows in MMVQ kernel #2231
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The non-EN models have a tensor with odd number of rows:
With BS > 1 and since d7e9f58 we have been doing out-of-bound writes with quantum models because
rows_per_cuda_block
can be equal to2
, leading to significantly poor transcription quality:https://github.com/ggerganov/whisper.cpp/blob/20c542c71334b3e6c422789093a19157b110ca81/ggml-cuda/mmvq.cu#L92-L123
Repro on RTX 2060: