A Better Evaluation Entry for OpenAI-Style API Models

### A Better Evaluation Entry for OpenAI-Style API Models

In practice, deployment and evaluation are typically separated to avoid dependency redundancy caused by supporting various models. The more models supported, the more complex the dependencies become. This approach also facilitates the evaluation of extremely large models.

I noticed that the framework supports OpenAI-style API models for evaluation through the `GPT4V` class. However, the user experience still needs improvement. Specifically:
1. To test a model, you need to **modify `config.py`** and register a `model_name`.
2. Diverse parameters (e.g., temperature, timeout) require manual adjustments.

Could you provide an interface like this?

```python
python run_api.py \
  --model-name "vllm_qwen_2.5-7b" \
  --base-url "xxxxxxxx" \
  --api-key "xxxxx" \
  --max-token-out 16000 \
  --min-pixels 3k \
  --max-pixels 100w \
  --temperature 0.1 \
  --top-p 0.9 \
  --data MME \
  --work-dir ./outputs
```

With such an entry point, models like those in [VLMEvalKit#1093](https://github.com/open-compass/VLMEvalKit/pull/1093) could be automatically supported without additional modifications. This would allow frameworks like vLLM, SGLang, and LMDeploy to support more models seamlessly.


在实际情况下，一般会部署和评测分离，这样可以摆脱支持各种各样的模型导致的依赖冗余，支持的模型越多，以来越复杂，以及更方便的支持超巨大模型的评测。而且一次也只会评测一个API, 多个模型就执行多次命令。

我看到框架里可以通过`GPT4V` 这个类来支持openai 格式的API模型的评测。但使用体验还是不够好。具体的：
1. 测试一个模型，需要**修改config.py**，注册一个model_name
2. 对多样性的参数需要手动修改，比如 temperature timeout


能否提供这样的接口：

```python
python run_api.py \
  --model-name "vllm_qwen_2.5-7b" \
  --base-url "xxxxxxxx" \
  --api-key "xxxxx" \
  --max-token-out 16000 \
  --min-pixels 3k \
  --max-pixels 100w \
  --temperature 0.1 \
  --top-p 0.9 \
  --data MME \
  --work-dir ./outputs
```


有了这样的入口，类似：https://github.com/open-compass/VLMEvalKit/pull/1093 这样模型就可以自动支持。并不需要额外支持。vllm， sglang 以及 lmdeploy 会支持更多的模型。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A Better Evaluation Entry for OpenAI-Style API Models #1094