-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add MMMU benchmark results #4491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
0111828
6d4ce17
02ed3b2
3bf70ea
8106d2a
0870ca5
82c18f2
6213f61
12d67d0
47eeb56
60e905a
e7d9d4c
8fe39d9
e7a9b0f
c9191b5
4ab2d1e
a001556
7131879
e37958f
c59be27
98f88ad
8730039
4320166
2912e47
1d926e6
cb9373a
aa5d7d4
6a49c71
1fd6495
8916c64
40a319c
5d63bde
392e1dd
539dd58
9e759c3
0d61208
5630f07
2ba021b
3821bf5
7be94da
58c7a78
88b8cf7
3c95028
e8c645d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,9 +22,18 @@ It's recommended to reduce the memory usage by appending something ike `--mem-fr | |
python benchmark/mmmu/bench_hf.py --model-path Qwen/Qwen2-VL-7B-Instruct | ||
``` | ||
|
||
Some popular model results: | ||
|
||
1. Qwen/Qwen2-VL-2B-Instruct: 0.241 | ||
2. Qwen/Qwen2-VL-7B-Instruct: 0.255 | ||
3. Qwen/Qwen2.5-VL-3B-Instruct: 0.245 | ||
4. Qwen/Qwen2.5-VL-7B-Instruct: 0.242 | ||
Benchmark Results: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please write unit tests and integrate them into the CI instead of relying solely on documentation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure thing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this CI addresses part of issue #5249 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @zhyncs Should I remove all the results currently listed in the README file? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay. I did remove them now. Thank you. |
||
|
||
| Model | SGLang | HuggingFace | | ||
|-------------------------|--------|-------------| | ||
| Qwen2-VL-7B-Instruct | 0.476 | - | | ||
| Qwen2.5-VL-7B-Instruct | 0.477 | - | | ||
| MiniCPM-V-2.6 | 0.435 | — | | ||
| MiniCPM-O-2_6 | 0.401 | - | | ||
| Deepseek-Janus-Pro-7B | 0.373 | - | | ||
| Deepseek-VL2 | 0.405 | - | | ||
| Gemma-3-it-4B | 0.41 | - | | ||
| llama3-llava-next-8b | 0.245 | - | | ||
| llava-v1.6-mistral-7b-sglang | 0.338 | - | | ||
| llava-onevision-qwen2-7b-ov | 0.423 | - | | ||
| Mlama - Llama-3.2-11B-Vision-Instruct | 0.321 | - | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed we can keep the results on benchmark, but remove speed results 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did @zhyncs ask us not to include the results in the README file? I’m a bit confused.