Skip to content

Enable SFT for multimodal llama4 #1889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Enable SFT for multimodal llama4 #1889

wants to merge 1 commit into from

Conversation

aireenmei
Copy link
Collaborator

@aireenmei aireenmei commented Jun 27, 2025

Description

Vanilla SFT support for multimodal llama4

  • Using "HuggingFaceM4/ChartQA" dataset and hf data pipeline, some preprocessing steps are specific to this dataset, serves as demo for other custom preprocessing.
  • Update multimodal_utils to have preprocessing functions using np instead of jax, so that it won't try to access TPU. Because we want these preprocessing functions to run on CPU while TPU is doing model computation.

Tests

TODOs for future PRs:

  • unit tests
  • doc
  • script for testing on TPU cluseters

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

Copy link
Collaborator

@hengtaoguo hengtaoguo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Thanks for the quick implementation!

Copy link
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! One question, have a you checked the ckpt from SFT (i.e. 50-100 steps), and do some decoding and see if it makes sense in the output?

@aireenmei aireenmei force-pushed the aireen/llama4_sft branch from 253903d to 2d67f70 Compare July 2, 2025 23:59
@aireenmei
Copy link
Collaborator Author

LGTM! One question, have a you checked the ckpt from SFT (i.e. 50-100 steps), and do some decoding and see if it makes sense in the output?

We haven't checked yet, after the functionality added with this PR. We plan to run more SFT tests, ideally with a bigger dataset since llama4 models are quite big.

Copy link
Collaborator

@gagika gagika left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Collaborator

@RissyRan RissyRan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. The decoding test will be verified as follow up.

@aireenmei aireenmei force-pushed the aireen/llama4_sft branch 2 times, most recently from 5408ce0 to 4296375 Compare July 7, 2025 20:54
@aireenmei aireenmei force-pushed the aireen/llama4_sft branch from 4296375 to 3b471d5 Compare July 8, 2025 04:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants