[v1] [P/D] Adding LMCache KV connector for v1 #16625

ApostaC · 2025-04-15T02:51:57Z

TL;DR: LMCache connector offers the following enhancements based on LMCache:

Fast KV Cache CPU offloading
Flexible KV cache pooling (sharing KV cache across multiple vLLM instances)
High-performance PD disaggregation powered by NIXL.

Example Usage

Disaggregated prefill

LMCache uses NIXL as the underlying KV transmission.
Run cd examples/lmcache/disagg_prefill_lmcache_v1 to get into disagg_prefill_lmcache_v1 folder, and then run

bash disagg_example_nixl.sh

Performance benchmarking:

Environment: 2x H100 with NVLink

Baselines

1P1D setup with LMCache + NIXL, each uses 1 GPU (This PR)
2 separate vLLM instances, each uses 1 GPU

Workload: Random dataset (see benchmarks/benchmark_serving.py):

python3 benchmark_serving.py --port 9000 --seed $(date +%s) \
        --model meta-llama/Llama-3.1-8B-Instruct \
        --dataset-name random --random-input-len 8000 --random-output-len 200 \
        --num-prompts 200 --burstiness 100 --request-rate 3.6

Comparison result

With LMCache-based PD disaggregation, we can achieve 40% higher tokens per second and 8x better tail inter-token latency.

CPU offloading

Run cd examples/lmcache/disagg_prefill_lmcache_v1 to get into disagg_prefill_lmcache_v1 folder, and then run

python cpu_offload_lmcache_v1.py

KV cache sharing

Run cd examples/lmcache/disagg_prefill_lmcache_v1 to get into disagg_prefill_lmcache_v1 folder, and then run

python kv_cache_sharing_lmcache_v1.py

github-actions · 2025-04-15T02:56:53Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: ApostaC <[email protected]>

randomseed713 · 2025-04-21T02:13:07Z

Does it support xpyd?

ApostaC · 2025-04-21T17:20:52Z

Does it support xpyd?

@randomseed713 We are working on this. Should be ready sometime this week

liuzijing2014 · 2025-04-22T23:56:09Z

Question: does this rely on Ray to do the communication? I try to run the example in the PR and get encountered into issue like:

(autoscaler +4m47s) Error: No available node types can fulfill resource request {'GPU': 1.0, 'node:2401:db00:eef0:1120:3520:0:9408:fcef': 0.001}. Add suitable node types to this cluster to resolve this issue.
INFO 04-22 16:19:26 [ray_utils.py:233] Waiting for creating a placement group of specs for 310 seconds. specs=[{'GPU': 1.0, 'node:2401:db00:eef0:1120:3520:0:9408:fcef': 0.001}, {'GPU': 1.0}, {'GPU': 1.0}, {'GPU': 1.0}, {'GPU': 1.0}, {'GPU': 1.0}, {'GPU': 1.0}, {'GPU': 1.0}]. Check `ray status` and `ray list nodes` to see if you have enough resources, and make sure the IP addresses used by ray cluster are the same as VLLM_HOST_IP environment variable specified in each node if you are running on a multi-node.

Signed-off-by: YaoJiayi <[email protected]>

ApostaC · 2025-04-23T17:09:05Z

Question: does this rely on Ray to do the communication?

@liuzijing2014 This PR doesn't depend on Ray. Can you share your command and environment details? I'm also in vLLM's slack workspace (name: Yihua Cheng), so feel free to DM me if you are also there.

Signed-off-by: ApostaC <[email protected]>

Huixxi · 2025-04-24T06:37:13Z

Does it support multi-nodes? Which version of lmcache should I install? Which python version should I use? Which pytorch version should I use?
And I met the error:

from lmcache.experimental.cache_engine import LMCacheEngine
(VllmWorker rank=0 pid=150410) ERROR 04-24 07:04:06 [multiproc_executor.py:435]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
(VllmWorker rank=3 pid=150413) ERROR 04-24 07:04:06 [multiproc_executor.py:435]     from lmcache.storage_backend.hybrid_backend import \
(VllmWorker rank=2 pid=150412) ERROR 04-24 07:04:06 [multiproc_executor.py:435]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(VllmWorker rank=4 pid=150414) ERROR 04-24 07:04:06 [multiproc_executor.py:435]   File "/home/logs/LMCache/lmcache/experimental/cache_engine.py", line 24, in <module>
(VllmWorker rank=6 pid=150416) ERROR 04-24 07:04:06 [multiproc_executor.py:435]     from lmcache.storage_backend.serde import CreateSerde, Deserializer
(VllmWorker rank=6 pid=150416) ERROR 04-24 07:04:06 [multiproc_executor.py:435]   File "/home/logs/LMCache/lmcache/storage_backend/serde/__init__.py", line 5, in <module>
(VllmWorker rank=6 pid=150416) ERROR 04-24 07:04:06 [multiproc_executor.py:435]     from lmcache.storage_backend.serde.cachegen_decoder import CacheGenDeserializer
(VllmWorker rank=6 pid=150416) ERROR 04-24 07:04:06 [multiproc_executor.py:435]   File "/home/logs/LMCache/lmcache/storage_backend/serde/cachegen_decoder.py", line 4, in <module>
(VllmWorker rank=7 pid=150417) ERROR 04-24 07:04:06 [multiproc_executor.py:435]     return _bootstrap._gcd_import(name[level:], package, level)
(VllmWorker rank=6 pid=150416) ERROR 04-24 07:04:06 [multiproc_executor.py:435]     import torchac_cuda  # type: ignore
(VllmWorker rank=7 pid=150417) ERROR 04-24 07:04:06 [multiproc_executor.py:435]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(VllmWorker rank=6 pid=150416) ERROR 04-24 07:04:06 [multiproc_executor.py:435] ImportError: /home/logs/huanxi/lmcache_venv_hu/lib/python3.10/site-packages/torchac_cuda.cpython-310-x86_64-linux-gnu.so:
 undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

… file. Signed-off-by: KuntaiDu <[email protected]>

…MCache/ folder Signed-off-by: KuntaiDu <[email protected]>

…ples that are not related to LLM inference. Signed-off-by: KuntaiDu <[email protected]>

Signed-off-by: KuntaiDu <[email protected]>

sdavidbd · 2025-04-27T11:34:45Z

examples/lmcache/README.md

+
+### Prerequisites
+
+- Install [LMCache](https://github.com/ai-dynamo/lmcache)


https://github.com/lmcache/lmcache?

Oh thanks for catching. I will submit a PR to fix this.

Signed-off-by: Agata Dobrzyniewicz <[email protected]>

zejun-chen · 2025-05-06T02:43:29Z

examples/lmcache/kv_cache_sharing_lmcache_v1.py

+os.environ["LMCACHE_REMOTE_SERDE"] = "naive"
+
+prompts = [
+    "Hello, how are you?" * 1000,


Hi, @ApostaC
we have a simple question:
Here the input prompts Hello, how are you? are duplicated 1000 times. Does the following feature of LMCache means the KV Cache can be shared only when the input prompts from different request are totally same?
Flexible KV cache pooling (sharing KV cache across multiple vLLM instances)

zhaotyer · 2025-05-08T09:28:12Z

[1746690477.974497] [llm206:611  :0]     ucp_context.c:1268 UCX  WARN  transports 'cuda_ipc','cuda_copy' are not available, please use one or more of: mm, posix, self, shm, sm, sysv, tcp
Backend UCX was instantiated
Initialized NIXL agent: NixlRole.SENDER
ERROR 05-08 00:47:57 [core.py:396] EngineCore failed to start.
ERROR 05-08 00:47:57 [core.py:396] Traceback (most recent call last):
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 387, in run_engine_core
ERROR 05-08 00:47:57 [core.py:396]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 05-08 00:47:57 [core.py:396]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 329, in __init__
ERROR 05-08 00:47:57 [core.py:396]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 64, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self.model_executor = executor_class(vllm_config)
ERROR 05-08 00:47:57 [core.py:396]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self._init_executor()
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 46, in _init_executor
ERROR 05-08 00:47:57 [core.py:396]     self.collective_rpc("init_device")
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-08 00:47:57 [core.py:396]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-08 00:47:57 [core.py:396]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils.py", line 2456, in run_method
ERROR 05-08 00:47:57 [core.py:396]     return func(*args, **kwargs)
ERROR 05-08 00:47:57 [core.py:396]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker_base.py", line 604, in init_device
ERROR 05-08 00:47:57 [core.py:396]     self.worker.init_device()  # type: ignore
ERROR 05-08 00:47:57 [core.py:396]     ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 135, in init_device
ERROR 05-08 00:47:57 [core.py:396]     init_worker_distributed_environment(self.vllm_config, self.rank,
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 329, in init_worker_distributed_environment
ERROR 05-08 00:47:57 [core.py:396]     ensure_kv_transfer_initialized(vllm_config)
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 63, in ensure_kv_transfer_initialized
ERROR 05-08 00:47:57 [core.py:396]     _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector_v1(
ERROR 05-08 00:47:57 [core.py:396]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 73, in create_connector_v1
ERROR 05-08 00:47:57 [core.py:396]     return connector_cls(config, role)
ERROR 05-08 00:47:57 [core.py:396]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/v1/lmcache_connector.py", line 25, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self._lmcache_engine = LMCacheConnectorV1Impl(vllm_config, role, self)
ERROR 05-08 00:47:57 [core.py:396]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/integration/vllm/vllm_v1_adapter.py", line 314, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self.lmcache_engine = init_lmcache_engine(
ERROR 05-08 00:47:57 [core.py:396]                           ^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/integration/vllm/vllm_adapter.py", line 111, in init_lmcache_engine
ERROR 05-08 00:47:57 [core.py:396]     engine = LMCacheEngineBuilder.get_or_create(ENGINE_NAME, config, metadata,
ERROR 05-08 00:47:57 [core.py:396]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/cache_engine.py", line 449, in get_or_create
ERROR 05-08 00:47:57 [core.py:396]     engine = LMCacheEngine(config, metadata, memory_allocator,
ERROR 05-08 00:47:57 [core.py:396]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/cache_engine.py", line 98, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self.storage_manager = DistributedStorageManager(
ERROR 05-08 00:47:57 [core.py:396]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/storage_backend/storage_manager.py", line 535, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self.storage_backend = NixlBackend.CreateNixlBackend(config, metadata)
ERROR 05-08 00:47:57 [core.py:396]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/storage_backend/nixl_backend.py", line 412, in CreateNixlBackend
ERROR 05-08 00:47:57 [core.py:396]     backend = NixlBackend(nixl_config)
ERROR 05-08 00:47:57 [core.py:396]               ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/storage_backend/nixl_backend.py", line 249, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self._nixl_channel = NixlChannel(nixl_config)
ERROR 05-08 00:47:57 [core.py:396]                          ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/storage_backend/connector/nixl_connector_v2.py", line 454, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self._pipe = NixlPipe(nixl_config, self._side_channel)
ERROR 05-08 00:47:57 [core.py:396]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/lmcache/experimental/storage_backend/connector/nixl_connector_v2.py", line 190, in __init__
ERROR 05-08 00:47:57 [core.py:396]     self._reg_descs = self._agent.register_memory(self._transfer_buffers)
ERROR 05-08 00:47:57 [core.py:396]                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 05-08 00:47:57 [core.py:396]   File "/usr/local/lib/python3.12/dist-packages/nixl/_api.py", line 265, in register_memory
ERROR 05-08 00:47:57 [core.py:396]     self.agent.registerMem(reg_descs, handle_list)
ERROR 05-08 00:47:57 [core.py:396] nixl._bindings.nixlBackendError: NIXL_ERR_BACKEND

Signed-off-by: Mu Huai <[email protected]>

Signed-off-by: Yuqi Zhang <[email protected]>

Signed-off-by: minpeter <[email protected]>

tlrmchlsmth mentioned this pull request Apr 15, 2025

[P/D][V1] KV Connector API V1 #15960

Merged

3 tasks

ApostaC force-pushed the local-dev/lmcache-v1-connector-pr branch from 36c97d1 to f6b9519 Compare April 17, 2025 03:35

mergify bot added documentation Improvements or additions to documentation v1 labels Apr 17, 2025

ApostaC force-pushed the local-dev/lmcache-v1-connector-pr branch from f6b9519 to 6a11a8a Compare April 17, 2025 03:53

ApostaC added 2 commits April 17, 2025 13:56

[Update] LMcache connector v1 implementation

4730522

Signed-off-by: ApostaC <[email protected]>

[Add] examples for disaggregated prefill

4162650

Signed-off-by: ApostaC <[email protected]>

ApostaC force-pushed the local-dev/lmcache-v1-connector-pr branch from 6a11a8a to 4162650 Compare April 17, 2025 20:58

ApostaC marked this pull request as ready for review April 17, 2025 20:59

[add] extra information about evns

3ccd34c

Signed-off-by: ApostaC <[email protected]>

This was referenced Apr 22, 2025

cpu offload not used ，Injected token number LMCache/LMCache#485

Open

Why is the overhead of broadcasting the fields of int so high LMCache/LMCache#491

Open

would LMCache in vLLM use CPU offloading? LMCache/LMCache#507

Closed

YaoJiayi added 4 commits April 23, 2025 12:05

add v1 offloading example

4e02846

Signed-off-by: YaoJiayi <[email protected]>

add remote kv cache sharing example

ad908c0

Signed-off-by: YaoJiayi <[email protected]>

fix comment

f6588bd

Signed-off-by: YaoJiayi <[email protected]>

add comment

91cc8df

Signed-off-by: YaoJiayi <[email protected]>

Merge branch 'main' into local-dev/lmcache-v1-connector-pr

20011a6

Signed-off-by: ApostaC <[email protected]>

ApostaC mentioned this pull request Apr 24, 2025

[VLLM][V1]--kv-transfer-config is not supported by the V1 Engine. Falling back to V0. LMCache/LMCache#462

Open

KuntaiDu added 3 commits April 25, 2025 01:01

Kuntai: add an end-to-end script for disaggregated prefill and README…

5595328

… file. Signed-off-by: KuntaiDu <[email protected]>

Kuntai: re-arrange file to centralize all examples of LMCache under L…

6c3caad

…MCache/ folder Signed-off-by: KuntaiDu <[email protected]>

Kuntai: move LMCache/ out from others/ as others/ are mainly for exam…

bd8646c

…ples that are not related to LLM inference. Signed-off-by: KuntaiDu <[email protected]>

ApostaC changed the title ~~[WIP] [v1] [P/D] Adding LMCache KV connector for v1~~ [v1] [P/D] Adding LMCache KV connector for v1 Apr 25, 2025

Align folder name in README.md

1f6802e

Signed-off-by: KuntaiDu <[email protected]>

make PyMarkdown happy

842b56d

Signed-off-by: KuntaiDu <[email protected]>

KuntaiDu requested review from comaniac, simon-mo, youkaichao and WoosukKwon April 25, 2025 04:49

simon-mo approved these changes Apr 26, 2025

View reviewed changes

KuntaiDu enabled auto-merge (squash) April 26, 2025 01:16

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 26, 2025

KuntaiDu merged commit 5e83a72 into vllm-project:main Apr 26, 2025
68 checks passed

sdavidbd reviewed Apr 27, 2025

View reviewed changes

KuntaiDu mentioned this pull request Apr 28, 2025

[Doc] Fix wrong github link in LMCache examples #17274

Merged

jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025

[v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625)

c526620

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

[v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625)

c4a43ce

adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025

[v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625)

d7b2a43

Signed-off-by: Agata Dobrzyniewicz <[email protected]>

zejun-chen reviewed May 6, 2025

View reviewed changes

sdavidbd mentioned this pull request May 8, 2025

[P/D][V1] Add generic KV Connector for delegation to external implementations #17840

Closed

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

[v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625)

222f358

Signed-off-by: Mu Huai <[email protected]>

ckhordiasma mentioned this pull request May 14, 2025

nm vllm ent 0.8.5 sync red-hat-data-services/vllm#139

Merged

zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025

[v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625)

99f0efd

Signed-off-by: Yuqi Zhang <[email protected]>

sdavidbd mentioned this pull request Jun 8, 2025

[RFC]: Graceful Error Handling for KV Connector Load Failures #19329

Open

1 task

minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025

[v1] [P/D] Adding LMCache KV connector for v1 (vllm-project#16625)

dc7bb92

Signed-off-by: minpeter <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[v1] [P/D] Adding LMCache KV connector for v1 #16625

[v1] [P/D] Adding LMCache KV connector for v1 #16625

ApostaC commented Apr 15, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

randomseed713 commented Apr 21, 2025

Uh oh!

ApostaC commented Apr 21, 2025

Uh oh!

liuzijing2014 commented Apr 22, 2025 •

edited

Loading

Uh oh!

ApostaC commented Apr 23, 2025

Uh oh!

Huixxi commented Apr 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

sdavidbd Apr 27, 2025

Uh oh!

KuntaiDu Apr 28, 2025

Uh oh!

zejun-chen May 6, 2025 •

edited

Loading

Uh oh!

zhaotyer commented May 8, 2025

Uh oh!

Uh oh!


		### Prerequisites

		- Install [LMCache](https://github.com/ai-dynamo/lmcache)

Uh oh!

[v1] [P/D] Adding LMCache KV connector for v1 #16625

[v1] [P/D] Adding LMCache KV connector for v1 #16625

Conversation

ApostaC commented Apr 15, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example Usage

Disaggregated prefill

Performance benchmarking:

CPU offloading

KV cache sharing

Uh oh!

github-actions bot commented Apr 15, 2025

Uh oh!

randomseed713 commented Apr 21, 2025

Uh oh!

ApostaC commented Apr 21, 2025

Uh oh!

liuzijing2014 commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ApostaC commented Apr 23, 2025

Uh oh!

Huixxi commented Apr 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sdavidbd Apr 27, 2025

Choose a reason for hiding this comment

Uh oh!

KuntaiDu Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

zejun-chen May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhaotyer commented May 8, 2025

Uh oh!

Uh oh!

ApostaC commented Apr 15, 2025 •

edited by github-actions bot

Loading

liuzijing2014 commented Apr 22, 2025 •

edited

Loading

Huixxi commented Apr 24, 2025 •

edited

Loading

zejun-chen May 6, 2025 •

edited

Loading