Skip to content

llama : enable FA by default and disable it per-layer #10005

Open
@ggerganov

Description

@ggerganov

See the discussion starting here: #9991 (comment) and the proposed solution here: #9991 (comment).

Additionally, switch to F32 precision for the K*Q matrix multiplication by default.

Marking this as good first issue as an opportunity for new contributors, but also it is kind of high priority, so we should probably implement this in a day or two if there is no progress. @slaren or @JohannesGaessler in case you already started to work on it, fill free to assign to the issue and finish it.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestroadmapPart of a roadmap project

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions