-
Notifications
You must be signed in to change notification settings - Fork 501
feat: add EIC connector #1718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add EIC connector #1718
Conversation
Summary of ChangesHello @Leafykn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Elastic Instant Cache (EIC) as a new high-performance, distributed backend for managing the KV cache in large language models. By leveraging EIC's capabilities, such as RDMA and GDR, the system aims to improve the efficiency, scalability, and fault tolerance of KV cache operations, which is crucial for optimizing LLM inference. The changes include adding EIC-specific configuration, implementing the connector logic, and providing necessary documentation and tests. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new EIC (Elastic Instant Cache) connector for KV cache offloading. The overall implementation is a great addition, but there are several critical and high-severity issues that need to be addressed. Most importantly, the connector code uses exit(1) to handle errors, which is not suitable for a library as it can terminate the entire application. It should instead return an error status or raise an exception. Additionally, there are potential None-related errors, a non-functional _delete method that silently fails, and a few bugs in the accompanying test file. I've provided detailed comments and suggestions for these points. Lastly, please note that the new test file is named test_uni.py, which might be a typo for test_unit.py.
python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/test_uni.py
Outdated
Show resolved
Hide resolved
python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/README.md
Outdated
Show resolved
Hide resolved
python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/test_uni.py
Outdated
Show resolved
Hide resolved
04f7dd0 to
72fc5a2
Compare
e6403df to
ec1d1e3
Compare
DwyaneShi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: yekenan <[email protected]>
Signed-off-by: yekenan <[email protected]>
Signed-off-by: yekenan <[email protected]>
ec1d1e3 to
1a534af
Compare
Signed-off-by: yekenan <[email protected]>
Description
EIC(Elastic Instant Cache) is a distributed database designed for LLM KV Cache. It supports RDMA, GDR and has the capabilities of distributed disaster tolerance and expansion. EIC KVCache
Benchmark
env
1*8 H20 96G and DeepSeek-R1-Distill-Llama-70B, use vllm/benchmarks/benchmark_serving.py
client:
sever