Skip to content

Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 #2484

@dranger003

Description

@dranger003
Contributor

Using CUDA on Windows when model vocab_size != 32000, inference crashes immediately with:

ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1

See #2160 (comment) for more details.
Reverting to commit before 11f3ca0 resolves the issue.
Also, the workaround proposed in #2160 (comment) appears to work (at least for me).

Activity

mirek190

mirek190 commented on Aug 2, 2023

@mirek190

The same problem

My arguments - model is llama2 variant 13B

main --model models\new2\newhope.ggmlv3.q4_K_M.bin --mlock --color --threads 30 --keep -1 --batch_size 512 --n_predict -1 --top_k 10000 --top_p 0.9 --temp 0.96 --repeat_penalty 1.1 --ctx_size 4096 --interactive --instruct --reverse-prompt "### Human:" --reverse-prompt "### User:" --reverse-prompt "### Assistant:" -ngl 43

ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1
PS E:\LLAMA\llama.cpp>

without -ngl parameter is working.

dranger003

dranger003 commented on Aug 2, 2023

@dranger003
ContributorAuthor

It appears PR #2480 solves this issue.

mirek190

mirek190 commented on Aug 2, 2023

@mirek190

Still not merged ....

dranger003

dranger003 commented on Aug 2, 2023

@dranger003
ContributorAuthor

Confirmed latest commit 4f6b60c resolves the issue on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @dranger003@mirek190@JohannesGaessler

        Issue actions

          Commit "CUDA: Quantized matrix matrix multiplication" causes assert "ggml-cuda.cu:4749: i01_high == rows_per_iter || g_device_count > 1" on Windows when vocab_size != 32000 · Issue #2484 · ggml-org/llama.cpp