-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Feat/qdrant collection suffix #2324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
The LightRAG Server does not appear to contain any code handling the |
|
Hi @danielaskdd, Thanks for the review! Let me address your questions: 1. How is collection_suffix configured and utilized? Users configure The parameter flows through LightRAG's initialization to 2. LightRAG Server integration
The LightRAG Server doesn't need special handling for 3. Is suffix functionality only applicable to vector storage? Yes, specifically only when using The problem: LightRAG creates three Qdrant collections when using Without All three collections get the same suffix to keep them synchronized. Other use cases:
4. How was this design decision considered? Why
Scalability: Qdrant supports 1,000 collections. With ~6 common embedding dimensions × 3 collections each = ~18 collections total, leaving plenty of room for growth while still supporting unlimited workspaces per collection set via payload filtering. |
|
Since a collection can only contain vectors of the same dimensionality, manually specifying dimension suffixes is prone to errors. Would it be more reasonable and convenient to automatically append dimension-based suffixes to collections based on the vector dimensions? |
|
Thanks for the great feedback! I considered implementing auto-detection initially, but there are a few challenges that make it better suited for a future enhancement: 1. Backward Compatibility 2. General Purpose Use Cases 3. Implementation Complexity Auto-detecting dimensions from embedding functions is non-trivial due to various function types. It's doable, but adds complexity. I think automatic dimension detection would make an excellent Phase 2 feature! We could add: Would you be open to merging this as Phase 1, and we can add auto-detection as a follow-up enhancement if there's demand? |
|
Vector collection suffixing is crucial for enabling LightRAG's future online workspace switching and multi-tenancy capabilities. This involves several key considerations:
Based on these considerations, we propose the following recommendations:
We recommend against incorporating environment-specific suffixes (e.g., |
|
Hi @danielaskdd, I agree with your long-term vision. However, there's a critical breaking change we need to address immediately: The ProblemLightRAG currently creates a set of collections fixed to a dimension based on the first workspace. This creates a system-wide limitation:
This breaks multi-embedding-model support entirely and needs an immediate fix. Regarding environment suffixes: Agreed - workspace-based isolation is the right approach. |
|
I understand your concern. As previously mentioned, during the startup of the LightRAG Server, data is automatically migrated from collection without suffix to those with proper suffix. After this process, the original collections without suffixes are deprecated. Going forward, all newly created workspaces will store data in correctly suffix-named collections. |
|
Thank you for clarifying @danielaskdd . can you assign someone to this pls. |
Description
Add support for collection suffixes in QdrantVectorDBStorage to allow creating separate sets of collections for different purposes such as embedding dimensions, environments, testing, and more.
Related Issues
This feature addresses the need to support different embedding dimensions in multi-tenant environments and provides a flexible way to manage separate collection sets without changing the core architecture.
Changes Made
collection_suffixparameter support in QdrantVectorDBStorage.post_initlightrag_vdb_{namespace}_{suffix})Checklist
Additional Notes
Usage Example