[Feature] lora serving performance #2372

Closed

Closed

[Feature] lora serving performance #2372

Labels

inactiveloraperformance

lora reasoning speed is very slow, I ran a gemma's lora, found that qkv proj takes 0.0003s, but without lora only 0.0001s, so the result is a token decode time difference of 20ms+

however, vllm lora serving is faster

Metadata

Assignees

No one assigned

Labels

inactiveloraperformance

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests