Enable SFT for multimodal llama4 #1889

aireenmei · 2025-06-27T06:51:29Z

Description

Vanilla SFT support for multimodal llama4

Using "HuggingFaceM4/ChartQA" dataset and hf data pipeline, some preprocessing steps are specific to this dataset, serves as demo for other custom preprocessing.
Update multimodal_utils to have preprocessing functions using np instead of jax, so that it won't try to access TPU. Because we want these preprocessing functions to run on CPU while TPU is doing model computation.

Tests

Tested pre-train with reduced-layer model on v5p, with this command. training output

TODOs for future PRs:

unit tests
doc
script for testing on TPU cluseters

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

hengtaoguo

Awesome! Thanks for the quick implementation!

MaxText/multimodal_utils.py

RissyRan

LGTM! One question, have a you checked the ckpt from SFT (i.e. 50-100 steps), and do some decoding and see if it makes sense in the output?

MaxText/configs/sft-vision.yml

MaxText/input_pipeline/_input_pipeline_utils.py

MaxText/pyconfig.py

MaxText/multimodal_utils.py

MaxText/configs/sft-vision.yml

MaxText/input_pipeline/_hf_data_processing.py

aireenmei · 2025-07-03T00:11:35Z

LGTM! One question, have a you checked the ckpt from SFT (i.e. 50-100 steps), and do some decoding and see if it makes sense in the output?

We haven't checked yet, after the functionality added with this PR. We plan to run more SFT tests, ideally with a bigger dataset since llama4 models are quite big.

gagika

Thanks!

RissyRan

Discussed offline. The decoding test will be verified as follow up.

aireenmei force-pushed the aireen/llama4_sft branch 2 times, most recently from 7846a54 to 62d1e5a Compare June 27, 2025 21:18

aireenmei marked this pull request as ready for review June 27, 2025 21:25

aireenmei requested review from SurbhiJainUSC, richjames0, gobbleturk, khatwanimohit, bvandermoon, vipannalla, RissyRan, gagika, shralex, yangyuwei, hengtaoguo and A9isha as code owners June 27, 2025 21:25

aireenmei assigned hengtaoguo and gagika Jun 27, 2025

aireenmei force-pushed the aireen/llama4_sft branch from 62d1e5a to 253903d Compare June 27, 2025 21:42

hengtaoguo approved these changes Jun 29, 2025

View reviewed changes

MaxText/multimodal_utils.py Show resolved Hide resolved

RissyRan reviewed Jun 30, 2025

View reviewed changes

MaxText/configs/sft-vision.yml Outdated Show resolved Hide resolved

MaxText/input_pipeline/_input_pipeline_utils.py Show resolved Hide resolved

MaxText/pyconfig.py Show resolved Hide resolved

gagika reviewed Jun 30, 2025

View reviewed changes

aireenmei force-pushed the aireen/llama4_sft branch from 253903d to 2d67f70 Compare July 2, 2025 23:59

gagika approved these changes Jul 7, 2025

View reviewed changes

RissyRan approved these changes Jul 7, 2025

View reviewed changes

aireenmei force-pushed the aireen/llama4_sft branch 2 times, most recently from 5408ce0 to 4296375 Compare July 7, 2025 20:54

Enable SFT for multimodal llama4

3b471d5

aireenmei force-pushed the aireen/llama4_sft branch from 4296375 to 3b471d5 Compare July 8, 2025 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable SFT for multimodal llama4 #1889

Enable SFT for multimodal llama4 #1889

aireenmei commented Jun 27, 2025 •

edited

Loading

Uh oh!

hengtaoguo left a comment

Uh oh!

Uh oh!

RissyRan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aireenmei commented Jul 3, 2025

Uh oh!

gagika left a comment

Uh oh!

RissyRan left a comment

Uh oh!

Uh oh!

Enable SFT for multimodal llama4 #1889

Are you sure you want to change the base?

Enable SFT for multimodal llama4 #1889

Conversation

aireenmei commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

TODOs for future PRs:

Checklist

Uh oh!

hengtaoguo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

aireenmei commented Jul 3, 2025

Uh oh!

gagika left a comment

Choose a reason for hiding this comment

Uh oh!

RissyRan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aireenmei commented Jun 27, 2025 •

edited

Loading