Skip to content

NRG: Improve membership changes checks & leader is caught-up for forwarded proposals#7809

Merged
neilalexander merged 5 commits intomainfrom
maurice/nrg-optimzation
Feb 10, 2026
Merged

NRG: Improve membership changes checks & leader is caught-up for forwarded proposals#7809
neilalexander merged 5 commits intomainfrom
maurice/nrg-optimzation

Conversation

@MauriceVanVeen
Copy link
Copy Markdown
Member

This PR improves a couple areas in our Raft logic:

  • Instead of scanning the log using logContainsUncommittedMembershipChange after we've won the leader election, we rely on the scan we do during startup of the Raft node and set n.membChanging if we process such an append entry. If the scan of logContainsUncommittedMembershipChange takes near the minimum election timeout, we could continuously cycle leader changes without making progress.
  • Don't allow concurrent membership changes if we got forwarded a peer-removal.
  • Only accept forwarded proposals if we're initially caught up as the new leader. This ensures the new leader gets some time to establish its leadership, up the n.commit based on what's in its log, and not be immediately overwhelmed by loads of forwarded proposals (for example due to inactive thresholds deleting consumers). This should help limit the growth of the log if many forwarded proposals are retried.

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner February 9, 2026 10:07
server/raft.go Outdated
return
}
prop := n.prop
n.membChanging = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the membChanging here is unnecessary. It may actually be harmful.
Setting membChanging to true is done in sendMembershipChange when EntryRemovePeer entries are picked up from the proposal queue and sent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting membChanging to true is done in sendMembershipChange when EntryRemovePeer entries are picked up from the proposal queue and sent.

This doesn't seem to be the case currently? Although, I agree it may be unsafe to mark membChanging before actually sending it.

Have updated the PR to only mark membChanging when sending (or when receiving on a follower), and only unset when applying it as a commit. This also means sendMembershipChange may receive multiple membership change proposals from the queue, but it now knows to ignore it if one was already sent previously.
4c04d98

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my comment was wrong. My worry was exactly that: pushing to n.prop does not mean that the message was sent, therefore a leader could set membChanging to true, and immediately become a follower. In that case a proposal to remove a peer could be dropped, without resetting membChanging to false. Your latest commit seems to fix that commit 4c04d98

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could consider squashing the latest commit, together with the first.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I'd rather keep them separate, since otherwise there's a bunch of conflicting changes to resolve.

Copy link
Copy Markdown
Contributor

@sciascid sciascid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes could lead to a scenario where membChanging is set, while no membership change is in progress. For instance, if a leader manages to log a EntryPeerRemove, but its message is lost, and the leader changes. Upon leader change, the uncommitted entry may need to be delete from the log of the old leader. At this point, the old leader has membChanging set, but no membership change is in progress.
Should leadership go back to this node, new membership changes would be rejected...

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/nrg-optimzation branch from c6326ed to 2102261 Compare February 9, 2026 14:09
@MauriceVanVeen
Copy link
Copy Markdown
Member Author

Have updated the PR to reset n.membChanging if it's truncated, it's now also a uint64 pointing to the index of the entry containing the membership change instead of a bool to do this efficiently.

Copy link
Copy Markdown
Contributor

@sciascid sciascid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice patch set!

Optional: consider renaming membChanging. The name makes me think that this is a boolean. But now it is an index. I'm thinking membChangeIndex... or something along those lines.

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/nrg-optimzation branch from 3cae01c to f8f86a2 Compare February 10, 2026 08:20
Copy link
Copy Markdown
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 63a99c5 into main Feb 10, 2026
69 of 70 checks passed
@neilalexander neilalexander deleted the maurice/nrg-optimzation branch February 10, 2026 13:59
neilalexander added a commit that referenced this pull request Feb 16, 2026
Includes the following:

- #7780
- #7784
- #7782
- #7783
- #7787
- #7789
- #7793
- #7797
- #7798
- #7799
- #7790
- #7805
- #7810
- #7811
- #7812
- #7809
- #7724
- #7815
- #7816
- #7818
- #7819
- #7820
- #7795
- #7825
- #7828
- #7835
- #7837

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants