You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Pooling API](#pooling-api) is similar to `LLM.encode`, being applicable to all types of pooling models.
142
142
-[Embeddings API](#embeddings-api) is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs) for embedding models.
143
143
-[Score API](#score-api) is similar to `LLM.score` for cross-encoder models.
144
+
145
+
## Matryoshka Representation Learning (MRL)
146
+
147
+
[Matryoshka Embeddings](https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html#matryoshka-embeddings) or [Matryoshka Representation Learning (MRL)](https://arxiv.org/abs/2205.13147) is a technique used in training embedding models. It allows user to trade off between performance and cost.
148
+
149
+
### Offline Inference
150
+
151
+
You can change the output of embedding models that support MRL by using the dimensions parameter in PoolingParams.
152
+
153
+
```python
154
+
from vllm importLLM, PoolingParams
155
+
156
+
model = LLM(model="jinaai/jina-embeddings-v3",
157
+
task="embed",
158
+
trust_remote_code=True)
159
+
outputs = model.embed(["Follow the white rabbit."],
160
+
pooling_params=PoolingParams(dimensions=32))
161
+
print(outputs[0].outputs)
162
+
```
163
+
164
+
A code example can be found here: <gh-file:examples/offline_inference/embed_matryoshka_fy.py>
A openai client example can be found here: <gh-file:examples/online_serving/openai_embedding_matryoshka_fy.py>
195
+
196
+
### Warning
197
+
198
+
**Not all embeddings models support MRL. Changing the output dimension for models that do not support MRL will lead to poor results. vllm returns an error for requests that attempt to change the output dimension (dimensions is not None) of an unsupported MRL model.**
199
+
200
+
For example, trying to change the output dimension of the BAAI/bge-m3 model will result in the following error.
201
+
202
+
```json
203
+
{"object":"error","message":"Model \"BAAI/bge-m3\" does not support matryoshka representation, changing output dimensions will lead to poor results.","type":"BadRequestError","param":null,"code":400}
204
+
```
205
+
206
+
We hope that the open source community will adopt the terms “is_matryoshka ” or “matryoshka_dimensions ” to denote whether a model is compatible with Matryoshka Representation Learning (MRL).
207
+
208
+
### Manually support MRL
209
+
210
+
For models supported by MRL but not recognized by vllm, please manually enable MRL support using hf_overrides={"is_matryoshka": True} (Offline) or --hf_overrides '{"is_matryoshka":true}' (online) with caution.
211
+
212
+
For example, using the following command to start vllm server can manually support MRL.
0 commit comments