Skip to content

[ COLLAB ] Integration into Axolotl framework #198

Open
@NanoCode012

Description

@NanoCode012

Hello! Thank you for creating this awesome repository.

We're currently working on integrating SageAttention into Axolotl as an alternative to FlashAttention 2 for LLM fine-tuning. Our PR: axolotl-ai-cloud/axolotl#2823

We've had some success so far: both packing and non-packing work correctly with LoRA fine-tuning. However, we're running into an issue with full fine-tuning (loss drops to zero and gradient norm explodes within just a few steps).

We suspect we might be making a mistake in the implementation. We were hoping a maintainer could take a look at the approach in the PR and offer any initial thoughts or guidance.

We would be very open to collaborating on a write-up or blog post about on this integration and to showcase SageAttention. If it's easier to discuss the technical details, I'd also be happy to hop on a quick call to discuss.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions