Skip to content

Add MMMU benchmark results #4491

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 44 commits into from
Apr 25, 2025
Merged
Changes from 18 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
0111828
Add MMMU benchmark results
Mar 17, 2025
6d4ce17
Update sglang qwen2.5 and minicpmv
Mar 17, 2025
02ed3b2
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Mar 20, 2025
3bf70ea
Update metrics and run instructions
Mar 20, 2025
8106d2a
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Mar 23, 2025
0870ca5
Update with latest results
Mar 23, 2025
82c18f2
Update minicpmv model results
Mar 23, 2025
6213f61
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 2, 2025
12d67d0
fix lint
Apr 3, 2025
47eeb56
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 3, 2025
60e905a
Update with llava based models and mllama
Apr 5, 2025
e7d9d4c
Update for Deepseek VL2
Apr 11, 2025
8fe39d9
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 11, 2025
e7a9b0f
update for MiniCPM-O-2_6
Apr 11, 2025
c9191b5
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 12, 2025
4ab2d1e
update
Apr 12, 2025
a001556
Update
Apr 12, 2025
7131879
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 12, 2025
e37958f
Add CI test for VLM models
Apr 15, 2025
c59be27
fix lints
Apr 15, 2025
98f88ad
remove static results
Apr 15, 2025
8730039
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 15, 2025
4320166
Update with lmm_evals
Apr 17, 2025
2912e47
update
Apr 17, 2025
1d926e6
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 17, 2025
cb9373a
remove env
Apr 17, 2025
aa5d7d4
Merge remote-tracking branch 'origin/ravi/benchmark_mmmu' into ravi/b…
Apr 17, 2025
6a49c71
Update CI install, run suit
Apr 18, 2025
1fd6495
change to python3
Apr 18, 2025
8916c64
Update with suggestions
Apr 18, 2025
40a319c
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 18, 2025
5d63bde
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 18, 2025
392e1dd
Update
Apr 18, 2025
539dd58
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 18, 2025
9e759c3
Update to mmmu_val
Apr 18, 2025
0d61208
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 18, 2025
5630f07
Update score for gemma3 model
Apr 18, 2025
2ba021b
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 21, 2025
3821bf5
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 21, 2025
7be94da
Update openai api key and base env variable names
Apr 21, 2025
58c7a78
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 21, 2025
88b8cf7
fix lmms-eval branch
Apr 21, 2025
3c95028
mem-fraction-static arguement
Apr 24, 2025
e8c645d
Merge branch 'main' into ravi/benchmark_mmmu
ravi03071991 Apr 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 15 additions & 6 deletions benchmark/mmmu/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,18 @@ It's recommended to reduce the memory usage by appending something ike `--mem-fr
python benchmark/mmmu/bench_hf.py --model-path Qwen/Qwen2-VL-7B-Instruct
```

Some popular model results:

1. Qwen/Qwen2-VL-2B-Instruct: 0.241
2. Qwen/Qwen2-VL-7B-Instruct: 0.255
3. Qwen/Qwen2.5-VL-3B-Instruct: 0.245
4. Qwen/Qwen2.5-VL-7B-Instruct: 0.242
Comment on lines -25 to -30
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed we can keep the results on benchmark, but remove speed results 😂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did @zhyncs ask us not to include the results in the README file? I’m a bit confused.

Benchmark Results:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write unit tests and integrate them into the CI instead of relying solely on documentation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this CI addresses part of issue #5249

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhyncs Should I remove all the results currently listed in the README file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I did remove them now. Thank you.


| Model | SGLang | HuggingFace |
|-------------------------|--------|-------------|
| Qwen2-VL-7B-Instruct | 0.476 | - |
| Qwen2.5-VL-7B-Instruct | 0.477 | - |
| MiniCPM-V-2.6 | 0.435 | — |
| MiniCPM-O-2_6 | 0.401 | - |
| Deepseek-Janus-Pro-7B | 0.373 | - |
| Deepseek-VL2 | 0.405 | - |
| Gemma-3-it-4B | 0.41 | - |
| llama3-llava-next-8b | 0.245 | - |
| llava-v1.6-mistral-7b-sglang | 0.338 | - |
| llava-onevision-qwen2-7b-ov | 0.423 | - |
| Mlama - Llama-3.2-11B-Vision-Instruct | 0.321 | - |