Skip to content

Conversation

@Leafykn
Copy link
Contributor

@Leafykn Leafykn commented Oct 30, 2025

Description

EIC(Elastic Instant Cache) is a distributed database designed for LLM KV Cache. It supports RDMA, GDR and has the capabilities of distributed disaster tolerance and expansion. EIC KVCache

Benchmark

env
1*8 H20 96G and DeepSeek-R1-Distill-Llama-70B, use vllm/benchmarks/benchmark_serving.py

client:

python3 benchmark_serving.py --backend vllm \
  --model /models/DeepSeek-R1-Distill-Llama-70B\
  --base-url http://0.0.0.0:8000 \
  --endpoint /v1/completions \
  --metric_percentiles '50,90,95,99' \
  --goodput ttft:5000 tpot:50 \
  --num-prompts 128 \
  --request-rate 16 \
  --max-concurrency 16 \
  --random-input-len 8000 \
  --random-prefix-len 0 \
  --random-output-len 200 \
  --dataset-name random \
  --ignore-eos \
  --served-model-name DeepSeek-R1-Distill-Llama-70B\
  --seed *

sever

python3 -m vllm.entrypoints.openai.api_server \
  --port "8000" \
  --uvicorn-log-level warning \
  --model /models/DeepSeek-R1-Distill-Llama-70B \
  --trust-remote-code \
  --served-model-name DeepSeek-R1-Distill-Llama-70B \
  --max-model-len "32768" \
  --disable-log-requests \
  --disable-fastapi-docs \
  --swap-space "16" \
  --no-enable-chunked-prefill \
  --gpu-memory-utilization="0.95" \
  --tensor-parallel-size=8 \
  --kv-transfer-config '{"kv_connector":"AIBrixOffloadingConnectorV1Type3", "kv_role":"kv_both"}' 
image

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Leafykn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Elastic Instant Cache (EIC) as a new high-performance, distributed backend for managing the KV cache in large language models. By leveraging EIC's capabilities, such as RDMA and GDR, the system aims to improve the efficiency, scalability, and fault tolerance of KV cache operations, which is crucial for optimizing LLM inference. The changes include adding EIC-specific configuration, implementing the connector logic, and providing necessary documentation and tests.

Highlights

  • New EIC Connector: Introduced a new connector for Elastic Instant Cache (EIC), a distributed database specifically designed for LLM KV Cache, supporting RDMA, GDR, distributed disaster tolerance, and expansion.
  • Configuration: Added a new environment variable, AIBRIX_KV_CACHE_OL_EIC_CONFIG, to specify the EIC configuration file, enabling flexible setup for the EIC connector.
  • Connector Integration: The connector.py file has been updated to dynamically create an EICConnector instance when 'EIC' is specified as the L2 cache backend.
  • EIC Implementation: A new eic.py module provides the core EICConnector class, implementing methods for connection management, memory registration (for torch.Tensor slabs), and KV operations like get, put, exists, and delete using the eic library.
  • Unit Tests and Documentation: Included a README.md for EIC deployment and usage instructions, along with test_uni.py for unit testing the EIC client's connectivity and basic functionalities.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new EIC (Elastic Instant Cache) connector for KV cache offloading. The overall implementation is a great addition, but there are several critical and high-severity issues that need to be addressed. Most importantly, the connector code uses exit(1) to handle errors, which is not suitable for a library as it can terminate the entire application. It should instead return an error status or raise an exception. Additionally, there are potential None-related errors, a non-functional _delete method that silently fails, and a few bugs in the accompanying test file. I've provided detailed comments and suggestions for these points. Lastly, please note that the new test file is named test_uni.py, which might be a typo for test_unit.py.

@Jeffwan Jeffwan requested review from DwyaneShi and dczhu October 30, 2025 20:59
@Leafykn Leafykn force-pushed the feat/connector-eic-backend branch from 04f7dd0 to 72fc5a2 Compare November 4, 2025 03:37
@Leafykn Leafykn force-pushed the feat/connector-eic-backend branch from e6403df to ec1d1e3 Compare November 6, 2025 03:59
Copy link
Collaborator

@DwyaneShi DwyaneShi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

yekenan added 3 commits November 5, 2025 20:07
@DwyaneShi DwyaneShi force-pushed the feat/connector-eic-backend branch from ec1d1e3 to 1a534af Compare November 6, 2025 04:07
Signed-off-by: yekenan <[email protected]>
@DwyaneShi DwyaneShi merged commit 1889080 into vllm-project:main Nov 6, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants