[API] [Misc]: Support LRU cache with TTL for prefix cache indexer #905

vie-serendipity · 2025-03-26T11:07:16Z

Pull Request Description

[Please provide a clear and concise description of your changes here]
Implement LRU cache with TTL to support limited prefix cache, to prevent OOM.

Related Issues

Resolves: #[Insert issue number(s)]
#892
Important: Before submitting, please complete the description above and review the checklist below.

Contribution Guidelines (Expand for Details)

We appreciate your contribution to aibrix! To ensure a smooth review process and maintain high code quality, please adhere to the following guidelines:

Pull Request Title Format

Your PR title should start with one of these prefixes to indicate the nature of the change:

[Bug]: Corrections to existing functionality
[CI]: Changes to build process or CI pipeline
[Docs]: Updates or additions to documentation
[API]: Modifications to aibrix's API or interface
[CLI]: Changes or additions to the Command Line Interface
[Misc]: For changes not covered above (use sparingly)

Note: For changes spanning multiple categories, use multiple prefixes in order of importance.

Submission Checklist

PR title includes appropriate prefix(es)
Changes are clearly explained in the PR description
New and existing tests pass successfully
Code adheres to project style and best practices
Documentation updated to reflect changes (if applicable)
Thorough testing completed, no regressions introduced

By submitting this PR, you confirm that you've read these guidelines and your changes align with the project's contribution standards.

test: add ut for lru and modify hash' ut Signed-off-by: vie-serendipity <[email protected]>

Signed-off-by: vie-serendipity <[email protected]>

varungup90 · 2025-03-26T17:21:08Z

I will review it today.

Copilot

Pull Request Overview

This PR implements an LRU cache with TTL to limit the size of the prefix cache and prevent memory exhaustion. Key changes include removing the manual eviction logic in the prefix cache indexer in favor of using an LRU store, updating tests to integrate with the new store interface, and introducing the LRUStore implementation and its accompanying tests in the cache package.

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
pkg/plugins/gateway/prefixcacheindexer/indexer.go	Removed unused time import and the Evict method from the interface to align with the new LRU store-based eviction strategy.
pkg/plugins/gateway/prefixcacheindexer/hash_test.go	Updated tests to use the new store and replaced manual eviction calls with a sleep delay to allow TTL expiration.
pkg/plugins/gateway/prefixcacheindexer/hash.go	Replaced direct block map operations with store Get/Put calls to integrate the LRU store implementation.
pkg/plugins/gateway/cache/store.go	Added a generic Store interface.
pkg/plugins/gateway/cache/lru_store.go	Introduced a new LRUStore implementing cache eviction based on capacity and TTL.
pkg/plugins/gateway/cache/lru_store_test.go	Added unit tests for LRUStore covering basic put/get behavior, TTL expiration, updates, and concurrent evictions.

Files not reviewed (1)

go.mod: Language not supported

Comments suppressed due to low confidence (2)

pkg/plugins/gateway/prefixcacheindexer/hash_test.go:67

[nitpick] Using a long sleep duration in tests can slow down the test suite. Consider using a shorter TTL in the test configuration or mocking time to simulate TTL expiration more efficiently.

time.Sleep(25 * time.Second)

pkg/plugins/gateway/prefixcacheindexer/indexer.go:29

[nitpick] The manual Evict method has been removed from the interface and its documentation, relying on the LRUStore's internal eviction mechanism. Ensure that this new approach is adequately covered by tests and that its behavior meets the application's eviction requirements.

Evict method and its comments removed

pkg/plugins/gateway/cache/lru_store_test.go

pkg/plugins/gateway/prefixcacheindexer/hash.go

varungup90 · 2025-03-28T09:40:32Z

pkg/plugins/gateway/prefixcacheindexer/hash.go

 	defaultPrefixCacheBlockSize              = 16
-	defaultPrefixCacheEvictionInternalInMS   = 50
-	defaultPrefixCacheEvictionDurationInMins = 60
+	defaultPrefixCacheEvictionInternalInMS   = 1000 // 1 second


you can rename the variable to defaultPrefixCacheEvictionInternalInSec

I think we will always use seconds for this parameter. let's change to XXXInSec

varungup90 · 2025-03-28T09:54:58Z

pkg/plugins/gateway/cache/lru_store.go

+
+type LRUStore[K comparable, V any] struct {
+	sync.Mutex
+	freeTable map[K]*entry[K, V]


Right now the structure is as follows

hash1 -> model1 -> pod1 + pod1 last access time
-> model1 -> pod2 + pod2 last access time
-> model2 -> pod3 + pod3 last access time
-> blocks last access time

Right now eviction is based of last blocks access time. I kept the eviction like this for now to keep it simple. Eventually we want to move to only delete pod1 + its last access time rather than restricting to only delete entire block.

Another requirement is that, for one request it creates 100 hashes. How to make sure all hashes are deleted for that request together. Thats how vllm implements block eviction.

Ask is that can you share some design ideas on how to extend it with new implementation?

Sorry for the late reply.

Eventually we want to move to only delete pod1 + its last access time rather than restricting to only delete entire block.

I think this is reasonable.

Another requirement is that, for one request it creates 100 hashes. How to make sure all hashes are deleted for that request together. Thats how vllm implements block eviction.

Just to confirm, I understand that after vllm processes the request, it frees all the blocks together, so you mean that the router implementation should also free them together.

But I think this involves two layers, and in reality, the router and the kv cache don't have actual interaction. Even if the lastAccessedTime is very close, it doesn't guarantee that there's already a cache for the current request on the node. Hence, I understand that the router here implements a best-effort approach. In the current implementation, selecting the pod with the most recent access would be the most likely to have the cached KV.

I think for precise routing, integrating with a distributed kv cache could be a solution, but that would come with a significant cost. What do you think? cc @Jeffwan

Hence, I understand that the router here implements a best-effort approach
yes, exactly.

I think for precise routing, integrating with a distributed kv cache could be a solution, but that would come with a significant cost.

This is not in the current scope, because of few reasons. 1. Both vLLM and current distributed kv cache solution (like vineyard we are using) do not provide interfaces to expose kv details. distributed kv cache has metadata server but that's for internal communication. 2. Even they provides a way, fetch the source of truth from engine or kv cache service will be costly. I agree that we can revisit this feature later.

Another requirement is that, for one request it creates 100 hashes. How to make sure all hashes are deleted for that request together

the cache hit follow the common prefix pattern. the earlier blocks alway have fresh "last access time". In that case, if one block is deleted, the blocks after that block will be deleted for sure. that means even map will support such case out of box? correct me if I understand incorrectly.

I agree with your point.

Jeffwan · 2025-03-28T17:47:58Z

pkg/plugins/gateway/cache/lru_store.go

+	e.Unlock()
+
+	for _, key := range keysToEvict {
+		e.Lock()


here. I am curious whether it need key level lock? could it be for loop level?

I don't quite understand your question. Let me clarify that the main reason for traversing to obtain the key without performing any operation first is to avoid locking too long time and blocking the router.

there was a typo in my original message. I mean lock inside loop or lock before loop. I think the lock inside loop helps in the scenarios that we have many eviction keys and there're high volume concurrency request modifying freeTable and lruList. in other cases, lock before loop looks better. I do not have strong option at this moment. I think long eviction window will accumulate more keys, it looks good to me

pkg/plugins/gateway/cache/lru_store.go

pkg/plugins/gateway/prefixcacheindexer/hash.go

Signed-off-by: vie-serendipity <[email protected]>

Jeffwan · 2025-03-31T13:22:26Z

I think most feedback are addressed. Please double check the remaining issues, then we should be good to go. @vie-serendipity @varungup90

pkg/plugins/gateway/prefixcacheindexer/hash.go

varungup90 · 2025-03-31T18:31:05Z

pkg/plugins/gateway/prefixcacheindexer/hash.go

 		c.hash.ResetWithSeed(c.seed)
-
-		block, ok = c.blocks[prefixHash]
+		block, ok = c.store.Get(prefixHash)


store is thread-safe, so we can remove lock from line 134 now. Right?

Store is thread-safe, but hash is currently not thread-safe. The current lock scope is too broad, I narrowed it down.

pkg/plugins/gateway/prefixcacheindexer/hash_test.go

pkg/plugins/gateway/cache/lru_store.go

varungup90 · 2025-03-31T19:38:33Z

Overall LGTM. Some nits, once you fix them ping me. I will merge the change. Thanks!

Signed-off-by: vie-serendipity <[email protected]>

Jeffwan · 2025-04-01T10:00:43Z

@varungup90 Seems all comments are addressed. I do not have further feedback as well

varungup90 · 2025-04-04T17:06:38Z

pkg/plugins/gateway/prefixcacheindexer/hash.go


 // returns matchedTokens, unMatchedTokens, matchedPods
 func (c *PrefixHashTable) MatchPrefix(tokens []byte, model string, pods []*v1.Pod) ([]byte, []byte, []*v1.Pod) {
-	c.mu.Lock()


@vie-serendipity I assumed LRU store is thread-safe and removing this lock was fine. While working on PR #933 I came across the bug for concurrent read/write for blocks.

I have added the lock back, if you get a chance, can you make LRU store thread-safe? Thank you.

okay, I will locate the bug and fix it.

I'm uncertain whether you encountered the bug after completely removing all locks from the PrefixHashTable implementation.
If so, I found the map obtained from the LRU store is not thread-safe, which should be locked when modifying it.

aibrix/pkg/utils/prefixcacheindexer/hash.go

Lines 106 to 122 in 6c32582

block, ok := c.store.Get(prefixHash)

if !ok {

block = Block{

modelToPods: map[string]map[string]time.Time{

model: {

pod: time.Now(),

},

},

}

} else {

blockPods, ok := block.modelToPods[model]

if !ok {

blockPods = map[string]time.Time{}

}

blockPods[pod] = time.Now()

block.modelToPods[model] = blockPods

}

In PR: 905, lock was removed only from MatchPrefix (AddPrefix had the lock). I encountered same issue that block map is not cloned when read from LRU store.

I propose that we make LRU store thread safe. There are two options

Block map returned from LRUStore is cloned so that way locks can be removed from hash table implementation.

Use of sync map in LRU store + clone block map on get function. This is preferable because it will remove the lock from LRUStore as well.

Thanks for your explanation, I will submit a PR to fix it.

…lm-project#905) * feat: add lru cache to support prefix cache indexer Signed-off-by: vie-serendipity <[email protected]>

vie-serendipity added 3 commits March 25, 2025 21:13

feat: add lru cache to support prefix cache indexer

d06ee58

test: add ut for lru and modify hash' ut Signed-off-by: vie-serendipity <[email protected]>

chore: cleanup and adjust params

c152149

Signed-off-by: vie-serendipity <[email protected]>

fix: unlock in defer and lint

f2b0365

Signed-off-by: vie-serendipity <[email protected]>

Jeffwan requested review from Copilot and varungup90 and removed request for Copilot March 27, 2025 21:56

Copilot AI reviewed Mar 27, 2025

View reviewed changes

varungup90 reviewed Mar 28, 2025

View reviewed changes

Jeffwan reviewed Mar 28, 2025

View reviewed changes

fix: adjust constant and use RWMutex in lru store

476527c

Signed-off-by: vie-serendipity <[email protected]>

varungup90 reviewed Mar 31, 2025

View reviewed changes

pkg/plugins/gateway/prefixcacheindexer/hash.go Outdated Show resolved Hide resolved

varungup90 reviewed Mar 31, 2025

View reviewed changes

pkg/plugins/gateway/prefixcacheindexer/hash.go Outdated Show resolved Hide resolved

varungup90 reviewed Mar 31, 2025

View reviewed changes

pkg/plugins/gateway/prefixcacheindexer/hash_test.go Outdated Show resolved Hide resolved

varungup90 reviewed Mar 31, 2025

View reviewed changes

pkg/plugins/gateway/cache/lru_store.go Outdated Show resolved Hide resolved

fix: nit in the pr comments

cf1f386

Signed-off-by: vie-serendipity <[email protected]>

varungup90 merged commit 0d998e1 into vllm-project:main Apr 1, 2025
11 checks passed

varungup90 reviewed Apr 4, 2025

View reviewed changes

	block, ok := c.store.Get(prefixHash)
	if !ok {
	block = Block{
	modelToPods: map[string]map[string]time.Time{
	model: {
	pod: time.Now(),
	},
	},
	}
	} else {
	blockPods, ok := block.modelToPods[model]
	if !ok {
	blockPods = map[string]time.Time{}
	}
	blockPods[pod] = time.Now()
	block.modelToPods[model] = blockPods
	}

[API] [Misc]: Support LRU cache with TTL for prefix cache indexer #905

[API] [Misc]: Support LRU cache with TTL for prefix cache indexer #905

Uh oh!

Conversation

vie-serendipity commented Mar 26, 2025

Pull Request Description

Related Issues

Pull Request Title Format

Submission Checklist

Uh oh!

varungup90 commented Mar 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jeffwan Mar 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Jeffwan commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

varungup90 commented Mar 31, 2025

Uh oh!

Jeffwan commented Apr 1, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Jeffwan Mar 28, 2025 •

edited

Loading