Support tuning moe for llama 4 model #5109

fzyzcjy · 2025-04-07T03:12:50Z

Motivation

tune outputs will be in #5092, here I only put script updates to avoid making 5092 so big

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

# Conflicts: # python/sglang/srt/layers/attention/flashattention_backend.py

This reverts commit ac4cca3.

…rineSue/5092

1. Adds a `use_irope` parameter to the RadixAttention class to indicate whether a layer should use local attention based on iRoPE 2. Modifies Llama4Attention to pass `use_irope=not self.nope` to RadixAttention, leveraging the existing NoPE flag 3. Updates FlashAttentionBackend.forward_extend to check for the `use_irope` flag when determining if local attention should be used 4. Simplifies local attention activation logic by directly checking `attention_chunk_size is not None` instead of using a separate flag

This reverts commit 82ee700.

This reverts commit fc81086.

CatherineSue and others added 30 commits April 4, 2025 16:36

Add Llama4 support

93ab6b9

complete pipeline

73c5d6d

fix

ca9870e

add locall_attn

fdb0dd6

load weight

2cd80c2

Merge branch 'main-upstream' into llama4

5a56108

# Conflicts: # python/sglang/srt/layers/attention/flashattention_backend.py

rm mllama4

ac4cca3

load experts

6cfb3a7

load weight

9fd5188

Revert "rm mllama4"

6afdfdf

This reverts commit ac4cca3.

Merge commit '9fd5188965867d0335d8dde357ec81b1a6880982' into pr/Cathe…

a8d4bff

…rineSue/5092

polish code

b0703ec

cleanup

6b21ef5

format

114a366

fix norm

1378fe0

add conversation template

1f18b0c

apply_router_weight_on_input

5c434d7

add chat template

3dc59e1

format

9266d96

fix load

a204c21

Merge branch 'main' into llama4

cedb65c

support k > 1

cc7e862

lint

d8c4432

fix

f5d4cf7

more

49834d7

more

a517a53

fix mlp

95de87d

Merge branch 'llama4' into feat/llama4_tuning

5cab0b5

more

fc81086

fzyzcjy added 11 commits April 7, 2025 07:42

more

82ee700

Revert "more"

f18a7de

This reverts commit 82ee700.

Revert "more"

f87f710

This reverts commit fc81086.

more

d789896

more

885d525

more

cf9b9e7

more

3bf8b1d

tuning

5d4f66a

tuning

42baa99

tuning

53de4ec

tuning

21e9f04

merrymercy closed this Apr 27, 2025

fzyzcjy mentioned this pull request May 6, 2025

Support tuning moe for llama 4 model #6042

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support tuning moe for llama 4 model #5109

Support tuning moe for llama 4 model #5109

Uh oh!

fzyzcjy commented Apr 7, 2025

Uh oh!

Uh oh!

Support tuning moe for llama 4 model #5109

Support tuning moe for llama 4 model #5109

Uh oh!

Conversation

fzyzcjy commented Apr 7, 2025

Motivation

Modifications

Checklist

Uh oh!

Uh oh!