feat: add EIC connector #1718

Leafykn · 2025-10-30T15:11:18Z

Description

EIC(Elastic Instant Cache) is a distributed database designed for LLM KV Cache. It supports RDMA, GDR and has the capabilities of distributed disaster tolerance and expansion. EIC KVCache

Benchmark

env
1*8 H20 96G and DeepSeek-R1-Distill-Llama-70B, use vllm/benchmarks/benchmark_serving.py

client:

python3 benchmark_serving.py --backend vllm \
  --model /models/DeepSeek-R1-Distill-Llama-70B\
  --base-url http://0.0.0.0:8000 \
  --endpoint /v1/completions \
  --metric_percentiles '50,90,95,99' \
  --goodput ttft:5000 tpot:50 \
  --num-prompts 128 \
  --request-rate 16 \
  --max-concurrency 16 \
  --random-input-len 8000 \
  --random-prefix-len 0 \
  --random-output-len 200 \
  --dataset-name random \
  --ignore-eos \
  --served-model-name DeepSeek-R1-Distill-Llama-70B\
  --seed *

sever

python3 -m vllm.entrypoints.openai.api_server \
  --port "8000" \
  --uvicorn-log-level warning \
  --model /models/DeepSeek-R1-Distill-Llama-70B \
  --trust-remote-code \
  --served-model-name DeepSeek-R1-Distill-Llama-70B \
  --max-model-len "32768" \
  --disable-log-requests \
  --disable-fastapi-docs \
  --swap-space "16" \
  --no-enable-chunked-prefill \
  --gpu-memory-utilization="0.95" \
  --tensor-parallel-size=8 \
  --kv-transfer-config '{"kv_connector":"AIBrixOffloadingConnectorV1Type3", "kv_role":"kv_both"}'

gemini-code-assist · 2025-10-30T15:11:35Z

Summary of Changes

Hello @Leafykn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Elastic Instant Cache (EIC) as a new high-performance, distributed backend for managing the KV cache in large language models. By leveraging EIC's capabilities, such as RDMA and GDR, the system aims to improve the efficiency, scalability, and fault tolerance of KV cache operations, which is crucial for optimizing LLM inference. The changes include adding EIC-specific configuration, implementing the connector logic, and providing necessary documentation and tests.

Highlights

New EIC Connector: Introduced a new connector for Elastic Instant Cache (EIC), a distributed database specifically designed for LLM KV Cache, supporting RDMA, GDR, distributed disaster tolerance, and expansion.
Configuration: Added a new environment variable, AIBRIX_KV_CACHE_OL_EIC_CONFIG, to specify the EIC configuration file, enabling flexible setup for the EIC connector.
Connector Integration: The connector.py file has been updated to dynamically create an EICConnector instance when 'EIC' is specified as the L2 cache backend.
EIC Implementation: A new eic.py module provides the core EICConnector class, implementing methods for connection management, memory registration (for torch.Tensor slabs), and KV operations like get, put, exists, and delete using the eic library.
Unit Tests and Documentation: Included a README.md for EIC deployment and usage instructions, along with test_uni.py for unit testing the EIC client's connectivity and basic functionalities.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new EIC (Elastic Instant Cache) connector for KV cache offloading. The overall implementation is a great addition, but there are several critical and high-severity issues that need to be addressed. Most importantly, the connector code uses exit(1) to handle errors, which is not suitable for a library as it can terminate the entire application. It should instead return an error status or raise an exception. Additionally, there are potential None-related errors, a non-functional _delete method that silently fails, and a few bugs in the accompanying test file. I've provided detailed comments and suggestions for these points. Lastly, please note that the new test file is named test_uni.py, which might be a typo for test_unit.py.

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/eic.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/test_uni.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/eic.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/README.md

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/test_uni.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/test_eic.py

python/aibrix_kvcache/aibrix_kvcache/envs.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/eic.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/test_eic.py

python/aibrix_kvcache/aibrix_kvcache/l2/connectors/eic/eic.py

DwyaneShi

LGTM

Signed-off-by: yekenan <[email protected]>

gemini-code-assist bot reviewed Oct 30, 2025

View reviewed changes

Jeffwan requested review from DwyaneShi and dczhu October 30, 2025 20:59