Add embeddings & reranking via Sentence Transformers #2381

tomaarsen · 2025-11-18T11:22:41Z

Description

Add Sentence Transformers embeddings & reranking

Related Issues

None

Changes Made

Add sentence_transformers_embed with auto embedding_dim and max_token_size, allowing you to use any model from https://huggingface.co/models?library=sentence-transformers
Add sentence_transformers_rerank, allowing you to use any model from https://huggingface.co/models?pipeline_tag=text-ranking&library=sentence-transformers
Update READMEs
Add a demo script based on lightrag_hf_demo.py

Checklist

Changes tested locally
Code reviewed
Documentation updated (if necessary)
Unit tests added (if applicable)

Additional Notes

I wasn't able to get lightrag_hf_demo.py to run nicely, so I couldn't extensively test the new demo lightrag_sentence_transformers_demo.py either, but the embedding and reranking components do work on their own.

Tom Aarsen

danielaskdd · 2025-11-18T14:36:52Z

@codex review

chatgpt-codex-connector

💡 Codex Review

LightRAG/lightrag/api/config.py

Lines 242 to 246 in 23e7ffb

    
           choices=[ 
        
               "lollms", 
        
               "ollama", 
        
               "openai", 
        
               "azure_openai",

Allow CLI to select sentence_transformers embedding binding

Even though create_app adds logic for a sentence_transformers embedding binding, the CLI still restricts --embedding-binding to the list shown here, so passing --embedding-binding sentence_transformers results in an argparse "invalid choice" error. Because of this omission the new embedding provider cannot be enabled from the server’s documented configuration interface.

LightRAG/lightrag/api/config.py

Lines 254 to 257 in 23e7ffb

    
           "--rerank-binding", 
        
           type=str, 
        
           default=get_env_value("RERANK_BINDING", DEFAULT_RERANK_BINDING), 
        
           choices=["null", "cohere", "jina", "aliyun"],

Add sentence_transformers to rerank binding choices

Similarly, the argparse definition for --rerank-binding still lists only null|cohere|jina|aliyun even though the server now maps a sentence_transformers reranker (lightrag/api/lightrag_server.py lines 943‑998). Attempting to run the API with --rerank-binding sentence_transformers is rejected before the app starts, making the newly added reranker unreachable via the supported CLI/environment configuration.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-18T14:41:57Z

lightrag/api/lightrag_server.py

+                        sentence_transformers_embed.func
+                        if isinstance(sentence_transformers_embed, EmbeddingFunc)
+                        else sentence_transformers_embed
+                    )
+                    return await actual_func(texts, embedding_dim=embedding_dim)


Wire sentence_transformers binding with required model

Selecting the new sentence_transformers embedding binding crashes immediately. The server block calls actual_func(texts, embedding_dim=embedding_dim) without ever constructing or passing a SentenceTransformer instance, yet sentence_transformers_embed (lightrag/llm/sentence_transformers.py lines 13‑32) requires a model positional argument and does not accept embedding_dim. As soon as the binding is chosen the API raises TypeError (missing model / unexpected embedding_dim), so the embedding provider cannot be used at all.

Useful? React with 👍 / 👎.

tomaarsen added 2 commits November 18, 2025 12:18

Add embeddings & reranking via Sentence Transformers

835edda

Reformat

23e7ffb

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add embeddings & reranking via Sentence Transformers #2381

Add embeddings & reranking via Sentence Transformers #2381

tomaarsen commented Nov 18, 2025

Uh oh!

danielaskdd commented Nov 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"--rerank-binding",
	type=str,
	default=get_env_value("RERANK_BINDING", DEFAULT_RERANK_BINDING),
	choices=["null", "cohere", "jina", "aliyun"],

Add embeddings & reranking via Sentence Transformers #2381

Are you sure you want to change the base?

Add embeddings & reranking via Sentence Transformers #2381

Conversation

tomaarsen commented Nov 18, 2025

Description

Related Issues

Changes Made

Checklist

Additional Notes

Uh oh!

danielaskdd commented Nov 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants