Open
Description
When using vLLM with FP8 quantization
engine = LLM(..., quantization='fp8')
the model loading process fails with the following error in load_dtensor_weights:
[rank1]: Traceback (most recent call last):
[rank1]: File "/data/juicefs_sharing_data/11171634/code/deepscaler/vivo-verl/tests/rollout/run_fsdp_vllm_bluelm.py", line 159, in <module>
[rank1]: main()
[rank1]: File "/data/juicefs_sharing_data/11171634/code/deepscaler/vivo-verl/tests/rollout/run_fsdp_vllm_bluelm.py", line 125, in main
[rank1]: load_dtensor_weights(state_dict, llm.llm_engine.model_executor.driver_worker.worker.model_runner.model)
[rank1]: File "/data/juicefs_sharing_data/11171634/code/deepscaler/vivo-verl/verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py", line 440, in load_dtensor_weights
[rank1]: weight_loader(actor_weights, vllm_model)
[rank1]: File "/data/juicefs_sharing_data/11171634/code/deepscaler/vivo-verl/verl/third_party/vllm/vllm_spmd/dtensor_weight_loaders.py", line 176, in qwen2_dtensor_weight_loader
[rank1]: weight_loader = param.weight_loader
[rank1]: AttributeError: 'Parameter' object has no attribute 'weight_loader'
vllm version: 0.7.3
PyTorch version: 2.5.1
Is there any plan to support an FP8 weight loader?
How can dtensor_weight_loader be modified to support FP8?
Metadata
Metadata
Assignees
Labels
No labels