Open
1 of 1 issue completedDescription
Update
Currently we have added text & vision support.
Repeated MMMU benchmark runs range between 53.6 - 55.5, consistent with the the benchmark reported in the original paper (55).
Known limitations: (See Execution Plan before for full list):
- Audio capabilities: currently we do not support audio at all.
LoRA / Image quality: Phi4MM depends on LoRA for full image capability, but there is some compatibility issues with the native SGL LORA solution. We are working on solving it by refactoring / generalizing SGL LoRA capabilities.Fixed with Refactor LoRA handling to support adapter tensors in fused format #6585, Fix incorrect LoRA weight loading for fused gate_up_proj #6734, Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. #6861)- Token: Phi4MM supports two types of image token conventions (
<|image1|>
and<|endoftext10|>
), currently we only support the latter. If you use the default chat template, it will automatically pick up the supported one.
Motivation
Supporting the Phi4 Multimodal model (https://huggingface.co/microsoft/Phi-4-multimodal-instruct in SGL.
Execution Plan:
- Basic text + image support (@lifuhuang Support Phi-4 Multi-Modal (text + vision only) #6494 )
- LoRA support (required for full image understanding capability): (@lifuhuang Refactor LoRA handling to support adapter tensors in fused format #6585 , Fix incorrect LoRA weight loading for fused gate_up_proj #6734 , Support LoRA in TestOpenAIVisionServer and fix fused kv_proj loading bug. #6861 )
- perf optimization (@lifuhuang Speed up set_lora_info by eliminating unnecessary H2D transfers #6960 [Perf] Refactor LoRAManager to eliminate stream syncs and redundant computations #6994)
- SGLang LoRA compatibility with Radix Attention (@Fridge003 [WIP] Enable radix cache for Lora feature #7216 )
- (low priority) Precomputed feature support.
- (low priority) Refactor SGL MM processor logic support for support the original token variable image token (e.g.,
<image_1>
) - (low priority) audio support
Related resources
No response