Skip to content

refactor(TODO-3803): Enhance key position retrieval with CommandInfo caching#3804

Open
retr0-kernel wants to merge 4 commits intoredis:masterfrom
retr0-kernel:master
Open

refactor(TODO-3803): Enhance key position retrieval with CommandInfo caching#3804
retr0-kernel wants to merge 4 commits intoredis:masterfrom
retr0-kernel:master

Conversation

@retr0-kernel
Copy link
Copy Markdown

@retr0-kernel retr0-kernel commented May 7, 2026

Fixes #3803
There's a TODO that's been sitting in cmdFirstKeyPos function. The function figures out which argument position holds the first key for a given Redis command and we need this to route commands to the right node. The way it works right now is by checking a hardcoded map of around 40 keyless commands and defaulting to 1 for everything else.

The thing is we already fetch COMMAND INFO from Redis at startup and cache it, and that response has the exact first-key position for every command baked in. We just weren't using it here. So every time Redis adds a new keyless command, someone has to remember to update our map by hand. Easy to forget, and we've probably already missed some.

What I changed: I wired the cached COMMAND INFO data into the routing logic. The function now takes an optional *CommandInfo and uses info.FirstKeyPos directly when it's available. All the cluster and ring call sites pass it in via a small helper that peeks at the already-warm cache.

The round-trip concern: @ndyakov had a genuine concern that first-key resolution shouldn't add a Redis round-trip. It doesn't. To tackle that, I added a Peek() method on the cache that just returns whatever is already in memory, so no network call, no blocking. If the cache is cold (like still starting up or something), it returns nil and the code falls through to the old hardcoded table exactly as before. Once the cache is warm, it's just a map lookup.

One edge case: eval/evalsha still get special-cased. Their FirstKeyPos in CommandInfo is 3, but that's only correct when numkeys > 0. When numkeys == 0 there are no key arguments at all, and you can only know that by looking at the actual runtime arguments, so no cached metadata can help there. So that logic is unchanged.

I've added comments throughout the code to make it easier to follow while reviewing. Tested locally and everything looks good, but if something seems off or I've got it wrong somewhere, just let me know and I'll fix it.


Note

Medium Risk
Touches core cluster/ring routing by changing how first-key positions are derived, which can affect shard selection and multi-key command fanout. Fallback behavior remains, but misclassification when cache is cold/warm could route commands incorrectly if bugs exist.

Overview
Improves command routing by replacing the hardcoded cmdFirstKeyPos heuristic with cmdFirstKeyPosWithInfo, which can use cached COMMAND INFO (CommandInfo.FirstKeyPos) to determine whether/where a command has key arguments.

Adds a non-blocking cmdsInfoCache.Peek() (RWMutex-protected) plus cmdInfoPeek helpers in cluster and ring clients, and updates cluster slot selection, multi-shard execution, and ring pipeline sharding to prefer cached metadata while preserving special-cases for eval*/memory and falling back to the previous hardcoded behavior when the cache is cold.

Reviewed by Cursor Bugbot for commit c956039. Bugbot is set up for automated code reviews on this repo. Configure here.

@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 7, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@jit-ci
Copy link
Copy Markdown

jit-ci Bot commented May 7, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

@retr0-kernel
Copy link
Copy Markdown
Author

Hi @ndyakov ,
This PR addresses issue #3803 , been working on this one for a while. Let me know if anything is needed from my end or any questions.

Comment thread command.go
Comment thread osscluster.go
Copy link
Copy Markdown
Member

@ndyakov ndyakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some suggestions. keep in mind there is another configuration with the default command policies and we may want to combine those. will get back to you with more information soon.

Comment thread command.go Outdated
if c == nil {
return nil
}
c.refreshLock.Lock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@retr0-kernel let's make refreshLock RWMutex and lock it only for reading in Peek and Get.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok swapped refreshLock from sync.Mutex to sync.RWMutex. Peek() only reads c.cmds so it uses RLock now, meaning concurrent routing goroutines calling it in parallel no longer block each other. Get() and Refresh() still take the write lock since they actually modify state. Sending the change for this just now.

Comment thread command.go
@retr0-kernel
Copy link
Copy Markdown
Author

left some suggestions. keep in mind there is another configuration with the default command policies and we may want to combine those. will get back to you with more information soon.

I will reiterate and get back to you on these.

@retr0-kernel
Copy link
Copy Markdown
Author

Hi @ndyakov ,
Did the changes you requested. Open to more as you suggested. Thanks for the review again.

will get back to you with more information soon.

@retr0-kernel retr0-kernel requested a review from ndyakov May 7, 2026 17:10
@retr0-kernel retr0-kernel force-pushed the master branch 2 times, most recently from 361dd5b to 95ae37d Compare May 7, 2026 17:25
@retr0-kernel
Copy link
Copy Markdown
Author

retr0-kernel commented May 7, 2026

Sorry for the mess. Accidentally committed with my official so just changed the authors. The code is still intact if any issues will cherry pick y changes into a new branch.

Comment thread commands_test.go
@retr0-kernel
Copy link
Copy Markdown
Author

Hi @ndyakov any updates on this? I was thinking of cherry picking for a clean change.

Comment thread command.go Outdated
Comment on lines +300 to +307
// otherwise the hardcoded fallback applies.
}

// if the command is keyless O(1) in-memory, no round-trip needed.
if _, ok := keylessCommands[name]; ok {
return 0
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's first check the keyless commands (like it was previously, before doing the switch with the name).

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ndyakov
This is done. Did it like it was before.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a failing testcase. I did tried researching it a bit --> https://github.com/redis/go-redis/actions/runs/25592951942/job/75133806596?pr=3804 and I think it is unrelated to this PR and our changes don't touch any of that code.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c956039. Configure here.

Comment thread osscluster_router.go
func (c *ClusterClient) createSlotSpecificCommand(ctx context.Context, originalCmd Cmder, keys []string) Cmder {
originalArgs := originalCmd.Args()
firstKeyPos := int(cmdFirstKeyPos(originalCmd))
firstKeyPos := cmdFirstKeyPosWithInfo(originalCmd, c.cmdInfoPeek(originalCmd.Name()))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent firstKeyPos from independent cache-dependent computations

Low Severity

createSlotSpecificCommand independently recomputes firstKeyPos via cmdFirstKeyPosWithInfo(originalCmd, c.cmdInfoPeek(...)), which can yield a different value than what executeMultiShard used to extract keys—if the cache transitions from cold to warm between calls. The old cmdFirstKeyPos was a pure function with no external state dependency, so this inconsistency was impossible before. The same pattern exists in slottedKeyedCommands where cmdFirstKeyPosWithInfo is called once to filter keyless commands and then again inside cmdSlot. If the cache warms between these calls, a command might pass the "has keys" check but then get routed as keyless.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c956039. Configure here.

@retr0-kernel retr0-kernel requested a review from ndyakov May 9, 2026 05:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use CommandInfo to determine first key position in cmdFirstKeyPos

3 participants