Skip to content

Conversation

@netbrah
Copy link

@netbrah netbrah commented Nov 22, 2025

Description

This pull request introduces support for automatic document chunking in the reranking pipeline, which is especially useful for models with strict token limits (such as ColBERT or Cohere rerank models). The changes allow documents that exceed the token limit to be split into manageable chunks, reranked individually, and then aggregated back to provide a final relevance score per original document. This enhancement is configurable via environment variables and is integrated into both the main API and example scripts.

Related Issues

N/A

Changes Made

  • Added chunk_documents_for_rerank and aggregate_chunk_scores utilities to lightrag/rerank.py to handle splitting long documents into token-limited chunks and aggregating their rerank scores back to the original documents. This includes both a tokenizer-based and a character-based fallback approach.
  • Updated the generic_rerank_api and cohere_rerank functions to accept and process chunking parameters

API and Server Integration

  • Modified the server rerank function in lightrag/api/lightrag_server.py to read chunking configuration from environment variables and pass these options to the rerank function when using the Cohere binding.

Configuration and Examples

  • Added new environment variables (RERANK_ENABLE_CHUNKING, RERANK_MAX_TOKENS_PER_DOC) to env.example for configuring chunking behavior.
  • Updated examples/rerank_example.py to document and utilize the new chunking configuration options, including reasonable defaults and usage notes.

Checklist

  • Changes tested locally
  • Code reviewed
  • Documentation updated (if necessary)
  • Unit tests added (if applicable)

Additional Notes

[Add any additional notes or context for the reviewer(s).]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant