You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Pooling API](#pooling-api) is similar to `LLM.encode`, being applicable to all types of pooling models.
142
142
-[Embeddings API](#embeddings-api) is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs) for embedding models.
143
143
-[Score API](#score-api) is similar to `LLM.score` for cross-encoder models.
144
+
145
+
## Matryoshka Embeddings
146
+
147
+
[Matryoshka Embeddings](https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html#matryoshka-embeddings) or [Matryoshka Representation Learning (MRL)](https://arxiv.org/abs/2205.13147) is a technique used in training embedding models. It allows user to trade off between performance and cost.
148
+
149
+
:::{warning}
150
+
Not all embedding models are trained using Matryoshka Representation Learning. To avoid misuse of the `dimensions` parameter, vLLM returns an error for requests that attempt to change the output dimension of models that do not support Matryoshka Embeddings.
151
+
152
+
For example, setting `dimensions` parameter while using the `BAAI/bge-m3` model will result in the following error.
153
+
154
+
```json
155
+
{"object":"error","message":"Model \"BAAI/bge-m3\" does not support matryoshka representation, changing output dimensions will lead to poor results.","type":"BadRequestError","param":null,"code":400}
156
+
```
157
+
158
+
:::
159
+
160
+
### Manually enable Matryoshka Embeddings
161
+
162
+
There is currently no official interface for specifying support for Matryoshka Embeddings. In vLLM, we simply check the existence of the fields `is_matryoshka` or `matryoshka_dimensions` inside `config.json`.
163
+
164
+
For models that support Matryoshka Embeddings but not recognized by vLLM, please manually override the config using `hf_overrides={"is_matryoshka": True}` (offline) or `--hf_overrides '{"is_matryoshka": true}'` (online).
165
+
166
+
Here is an example to serve a model with Matryoshka Embeddings enabled.
You can change the output dimensions of embedding models that support Matryoshka Embeddings by using the dimensions parameter in {class}`~vllm.PoolingParams`.
175
+
176
+
```python
177
+
from vllm importLLM, PoolingParams
178
+
179
+
model = LLM(model="jinaai/jina-embeddings-v3",
180
+
task="embed",
181
+
trust_remote_code=True)
182
+
outputs = model.embed(["Follow the white rabbit."],
183
+
pooling_params=PoolingParams(dimensions=32))
184
+
print(outputs[0].outputs)
185
+
```
186
+
187
+
A code example can be found here: <gh-file:examples/offline_inference/embed_matryoshka_fy.py>
0 commit comments