-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[Bugfix] fix pp for llama4 #16746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] fix pp for llama4 #16746
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Lu Fang <[email protected]>
fcd5805
to
fdc4236
Compare
self.language_model = _initialize_model( | ||
vllm_config=vllm_config.with_hf_config(config.text_config), | ||
vllm_config=vllm_config.with_hf_config(config.text_config, | ||
["LlamaForCausalLM"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use Llama4ForCausalLM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llama4ForCasualLm is not registered architecture, we should avoid using that which requires adding a lot of hacks as in the initial PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably a dumb question, but since _initialize_model
is already pointing to model_class=Llama4ForCausalLM
, why do we need to override the architectures
here to LlamaForCausalLM
?
vllm/vllm/model_executor/model_loader/loader.py
Lines 114 to 123 in 8cac35b
def _initialize_model( | |
vllm_config: VllmConfig, | |
*, | |
prefix: str = "", | |
model_class: Optional[type[nn.Module]] = None, | |
) -> nn.Module: | |
"""Initialize a model with the given configurations.""" | |
model_config = vllm_config.model_config | |
if model_class is None: | |
model_class, _ = get_model_architecture(model_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during __post__init__
of the hf config when we call replace
inside with_hf_config
function
Line 3781 in a018e55
return replace(self, model_config=model_config) |
Line 3790 in a018e55
self.model_config.verify_with_parallel_config(self.parallel_config) |
vllm/vllm/model_executor/models/registry.py
Lines 441 to 442 in a018e55
normalized_arch = list( | |
filter(lambda model: model in self.models, architectures)) |
@@ -824,7 +824,7 @@ def load_weights(self, weights: Iterable[Tuple[str, | |||
# language_model is an Llama4ForCausalLM instance. We load it's | |||
# using llama4's load_weights routine. | |||
language_model_weights, other_weights = self.separate_weights( | |||
weights, prefix="language_model.model.") | |||
weights, prefix="language_model.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering why this issue was not triggered before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
language_model.lm_head can also be loaded by the parent model weight loader if not PP enabled, but for PP, as we split the weights into 2 parts, the lm_head is missing in PP=0 so it will raise an issue weight not found. while in llama4.py model loading we have logic handling is_pp_missing_parameter to avoid the exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix.
Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Yang Wang <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
Signed-off-by: Lu Fang <[email protected]>
Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>
Signed-off-by: Lu Fang <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR fixes PP for llama4 (#16385)
Loading language model with architecture to support PP verification and correct the prefix to separate model loading weights for language model and rest of the models so both PP=0 and PP=1 can work.