Skip to content

Commit dbeb036

Browse files
committed
[Doc] Document Matryoshka Representation Learning support
1 parent 8cac35b commit dbeb036

File tree

2 files changed

+104
-0
lines changed

2 files changed

+104
-0
lines changed

docs/source/models/pooling_models.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,76 @@ Our [OpenAI-Compatible Server](#openai-compatible-server) provides endpoints tha
141141
- [Pooling API](#pooling-api) is similar to `LLM.encode`, being applicable to all types of pooling models.
142142
- [Embeddings API](#embeddings-api) is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs) for embedding models.
143143
- [Score API](#score-api) is similar to `LLM.score` for cross-encoder models.
144+
145+
## Matryoshka Representation Learning (MRL)
146+
147+
[Matryoshka Embeddings](https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html#matryoshka-embeddings) or [Matryoshka Representation Learning (MRL)](https://arxiv.org/abs/2205.13147) is a technique used in training embedding models. It allows user to trade off between performance and cost.
148+
149+
### Offline Inference
150+
151+
You can change the output of embedding models that support MRL by using the dimensions parameter in PoolingParams.
152+
153+
```python
154+
from vllm import LLM, PoolingParams
155+
156+
model = LLM(model="jinaai/jina-embeddings-v3",
157+
task="embed",
158+
trust_remote_code=True)
159+
outputs = model.embed(["Follow the white rabbit."],
160+
pooling_params=PoolingParams(dimensions=32))
161+
print(outputs[0].outputs)
162+
```
163+
164+
A code example can be found here: <gh-file:examples/offline_inference/embed_matryoshka_fy.py>
165+
166+
### Online Inference
167+
168+
Use the following command to start vllm server.
169+
170+
```text
171+
vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
172+
```
173+
174+
You can change the output of embedding models that support MRL by using the dimensions parameter.
175+
176+
```text
177+
curl http://127.0.0.1:8000/v1/embeddings \
178+
-H 'accept: application/json' \
179+
-H 'Content-Type: application/json' \
180+
-d '{
181+
"input": "Follow the white rabbit.",
182+
"model": "jinaai/jina-embeddings-v3",
183+
"encoding_format": "float",
184+
"dimensions": 1
185+
}'
186+
```
187+
188+
expected output
189+
190+
```json
191+
{"id":"embd-0aab28c384d348c3b8f0eb783109dc5f","object":"list","created":1744195454,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-1.0]}],"usage":{"prompt_tokens":10,"total_tokens":10,"completion_tokens":0,"prompt_tokens_details":null}}
192+
```
193+
194+
A openai client example can be found here: <gh-file:examples/online_serving/openai_embedding_matryoshka_fy.py>
195+
196+
### Warning
197+
198+
**Not all embeddings models support MRL. Changing the output dimension for models that do not support MRL will lead to poor results. vllm returns an error for requests that attempt to change the output dimension (dimensions is not None) of an unsupported MRL model.**
199+
200+
For example, trying to change the output dimension of the BAAI/bge-m3 model will result in the following error.
201+
202+
```json
203+
{"object":"error","message":"Model \"BAAI/bge-m3\" does not support matryoshka representation, changing output dimensions will lead to poor results.","type":"BadRequestError","param":null,"code":400}
204+
```
205+
206+
We hope that the open source community will adopt the terms “is_matryoshka ” or “matryoshka_dimensions ” to denote whether a model is compatible with Matryoshka Representation Learning (MRL).
207+
208+
### Manually support MRL
209+
210+
For models supported by MRL but not recognized by vllm, please manually enable MRL support using hf_overrides={"is_matryoshka": True} (Offline) or --hf_overrides '{"is_matryoshka":true}' (online) with caution.
211+
212+
For example, using the following command to start vllm server can manually support MRL.
213+
214+
```text
215+
vllm serve Snowflake/snowflake-arctic-embed-m-v1.5 --hf_overrides '{"is_matryoshka":true}'
216+
```
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
3+
from openai import OpenAI
4+
5+
# Modify OpenAI's API key and API base to use vLLM's API server.
6+
openai_api_key = "EMPTY"
7+
openai_api_base = "http://localhost:8000/v1"
8+
9+
10+
def main():
11+
client = OpenAI(
12+
# defaults to os.environ.get("OPENAI_API_KEY")
13+
api_key=openai_api_key,
14+
base_url=openai_api_base,
15+
)
16+
17+
models = client.models.list()
18+
model = models.data[0].id
19+
20+
responses = client.embeddings.create(
21+
input=["Follow the white rabbit."],
22+
model=model,
23+
dimensions=1,
24+
)
25+
26+
for data in responses.data:
27+
print(data.embedding) # List of float of len 1
28+
29+
30+
if __name__ == "__main__":
31+
main()

0 commit comments

Comments
 (0)