[ COLLAB ] Integration into Axolotl framework

Hello! Thank you for creating this awesome repository.

We're currently working on integrating SageAttention into Axolotl as an alternative to FlashAttention 2 for LLM fine-tuning. Our PR: https://github.com/axolotl-ai-cloud/axolotl/pull/2823

We've had some success so far: both packing and non-packing work correctly with LoRA fine-tuning. However, we're running into an issue with full fine-tuning (loss drops to zero and gradient norm explodes within just a few steps).

We suspect we might be making a mistake in the implementation. We were hoping a maintainer could take a look at the approach in the PR and offer any initial thoughts or guidance.

We would be very open to collaborating on a write-up or blog post about on this integration and to showcase SageAttention. If it's easier to discuss the technical details, I'd also be happy to hop on a quick call to discuss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ COLLAB ] Integration into Axolotl framework #198

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ COLLAB ] Integration into Axolotl framework #198

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions