Skip to content

Commit 0607dc6

Browse files
noooopdbyoung18
authored andcommitted
[Doc] Document Matryoshka Representation Learning support (vllm-project#16770)
1 parent 21c0fc3 commit 0607dc6

File tree

2 files changed

+110
-0
lines changed

2 files changed

+110
-0
lines changed

docs/source/models/pooling_models.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,3 +141,77 @@ Our [OpenAI-Compatible Server](#openai-compatible-server) provides endpoints tha
141141
- [Pooling API](#pooling-api) is similar to `LLM.encode`, being applicable to all types of pooling models.
142142
- [Embeddings API](#embeddings-api) is similar to `LLM.embed`, accepting both text and [multi-modal inputs](#multimodal-inputs) for embedding models.
143143
- [Score API](#score-api) is similar to `LLM.score` for cross-encoder models.
144+
145+
## Matryoshka Embeddings
146+
147+
[Matryoshka Embeddings](https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html#matryoshka-embeddings) or [Matryoshka Representation Learning (MRL)](https://arxiv.org/abs/2205.13147) is a technique used in training embedding models. It allows user to trade off between performance and cost.
148+
149+
:::{warning}
150+
Not all embedding models are trained using Matryoshka Representation Learning. To avoid misuse of the `dimensions` parameter, vLLM returns an error for requests that attempt to change the output dimension of models that do not support Matryoshka Embeddings.
151+
152+
For example, setting `dimensions` parameter while using the `BAAI/bge-m3` model will result in the following error.
153+
154+
```json
155+
{"object":"error","message":"Model \"BAAI/bge-m3\" does not support matryoshka representation, changing output dimensions will lead to poor results.","type":"BadRequestError","param":null,"code":400}
156+
```
157+
158+
:::
159+
160+
### Manually enable Matryoshka Embeddings
161+
162+
There is currently no official interface for specifying support for Matryoshka Embeddings. In vLLM, we simply check the existence of the fields `is_matryoshka` or `matryoshka_dimensions` inside `config.json`.
163+
164+
For models that support Matryoshka Embeddings but not recognized by vLLM, please manually override the config using `hf_overrides={"is_matryoshka": True}` (offline) or `--hf_overrides '{"is_matryoshka": true}'` (online).
165+
166+
Here is an example to serve a model with Matryoshka Embeddings enabled.
167+
168+
```text
169+
vllm serve Snowflake/snowflake-arctic-embed-m-v1.5 --hf_overrides '{"is_matryoshka":true}'
170+
```
171+
172+
### Offline Inference
173+
174+
You can change the output dimensions of embedding models that support Matryoshka Embeddings by using the dimensions parameter in {class}`~vllm.PoolingParams`.
175+
176+
```python
177+
from vllm import LLM, PoolingParams
178+
179+
model = LLM(model="jinaai/jina-embeddings-v3",
180+
task="embed",
181+
trust_remote_code=True)
182+
outputs = model.embed(["Follow the white rabbit."],
183+
pooling_params=PoolingParams(dimensions=32))
184+
print(outputs[0].outputs)
185+
```
186+
187+
A code example can be found here: <gh-file:examples/offline_inference/embed_matryoshka_fy.py>
188+
189+
### Online Inference
190+
191+
Use the following command to start vllm server.
192+
193+
```text
194+
vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
195+
```
196+
197+
You can change the output dimensions of embedding models that support Matryoshka Embeddings by using the dimensions parameter.
198+
199+
```text
200+
curl http://127.0.0.1:8000/v1/embeddings \
201+
-H 'accept: application/json' \
202+
-H 'Content-Type: application/json' \
203+
-d '{
204+
"input": "Follow the white rabbit.",
205+
"model": "jinaai/jina-embeddings-v3",
206+
"encoding_format": "float",
207+
"dimensions": 1
208+
}'
209+
```
210+
211+
Expected output:
212+
213+
```json
214+
{"id":"embd-0aab28c384d348c3b8f0eb783109dc5f","object":"list","created":1744195454,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-1.0]}],"usage":{"prompt_tokens":10,"total_tokens":10,"completion_tokens":0,"prompt_tokens_details":null}}
215+
```
216+
217+
A openai client example can be found here: <gh-file:examples/online_serving/openai_embedding_matryoshka_fy.py>
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# SPDX-License-Identifier: Apache-2.0
2+
"""Example Python client for embedding API dimensions using vLLM API server
3+
NOTE:
4+
start a supported Matryoshka Embeddings model server with `vllm serve`, e.g.
5+
vllm serve jinaai/jina-embeddings-v3 --trust-remote-code
6+
"""
7+
8+
from openai import OpenAI
9+
10+
# Modify OpenAI's API key and API base to use vLLM's API server.
11+
openai_api_key = "EMPTY"
12+
openai_api_base = "http://localhost:8000/v1"
13+
14+
15+
def main():
16+
client = OpenAI(
17+
# defaults to os.environ.get("OPENAI_API_KEY")
18+
api_key=openai_api_key,
19+
base_url=openai_api_base,
20+
)
21+
22+
models = client.models.list()
23+
model = models.data[0].id
24+
25+
responses = client.embeddings.create(
26+
input=["Follow the white rabbit."],
27+
model=model,
28+
dimensions=1,
29+
)
30+
31+
for data in responses.data:
32+
print(data.embedding) # List of float of len 1
33+
34+
35+
if __name__ == "__main__":
36+
main()

0 commit comments

Comments
 (0)