llama : enable FA by default and disable it per-layer

See the discussion starting here: https://github.com/ggerganov/llama.cpp/issues/9991#issuecomment-2428407002 and the proposed solution here: https://github.com/ggerganov/llama.cpp/issues/9991#issuecomment-2428868490.

Additionally, switch to F32 precision for the `K*Q` matrix multiplication by default.

Marking this as good first issue as an opportunity for new contributors, but also it is kind of high priority, so we should probably implement this in a day or two if there is no progress. @slaren or @JohannesGaessler in case you already started to work on it, fill free to assign to the issue and finish it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llama : enable FA by default and disable it per-layer #10005

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llama : enable FA by default and disable it per-layer #10005

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions