Skip to content

Conversation

@nhtruong
Copy link
Contributor

@nhtruong nhtruong commented Oct 20, 2025

Coming from Dynamic MCP Servers discussion: 2nd bullet point under Implementation.

Summary

This PR implements a distributed leader election system using Redis, enabling LibreChat to coordinate tasks across multiple server instances in a cluster deployment.

Changes

Core Implementation

  • New LeaderElection class (packages/api/src/cluster/LeaderElection.ts)
    • Atomic leader election using Redis SET NX command
    • Automatic leader selection during bootup
    • TTL-based leadership with automatic refresh (25s TTL, 10s refresh interval)
    • Graceful resignation on shutdown (SIGTERM/SIGINT handling)
    • Crash recovery via TTL expiration
    • Randomized election delays to prevent thundering herd

Refactoring

  • Cache configuration improvements
    • Moved GLOBAL_PREFIX_SEPARATOR to cacheConfig for centralized configuration
    • Updated Redis client initialization to use consistent key prefixes

Testing & CI

  • Comprehensive integration tests
    • Leader shutdown and re-election scenarios
    • Leader crash simulation (TTL expiration)
    • Race condition handling (simultaneous elections)
    • Proper async cleanup to prevent test hanging
    • Added test:cache-integration:cluster npm script
  • Consolidated CI workflows
    • Combined cache and cluster integration tests into single workflow
    • Tests trigger on changes to cache/ or cluster/ directories
    • Both single Redis and cluster setups validated

Test File Naming

  • Renamed integration test files to use .cache_integration.spec.ts suffix for consistency

Technical Details

How it works:

  1. On startup, each server instance calls isLeader() to check/claim leadership
  2. First instance to set the Redis key becomes the leader
  3. Leader maintains the key by refreshing its TTL every 10 seconds
  4. If leader crashes, TTL expires after 25 seconds allowing re-election
  5. On graceful shutdown, leader deletes its key for immediate re-election

Use cases:

  • Scheduled background jobs (only leader executes)
  • Singleton tasks in multi-instance deployments
  • Resource-intensive operations that should run once per cluster

Testing

npm run test:cache-integration:cluster

All tests pass with proper cleanup and no hanging processes.

Testing

New tests have been added to CI.

Checklist

Please delete any irrelevant options.

  • My code adheres to this project's style guidelines
  • I have performed a self-review of my own code
  • I have commented in any complex areas of my code
  • I have made pertinent documentation changes
  • My changes do not introduce new warnings
  • I have written tests demonstrating that my changes are effective or that my feature works
  • Local unit tests pass with my changes
  • Any changes dependent on mine have been merged and published in downstream modules.

@nhtruong nhtruong force-pushed the leader-election branch 4 times, most recently from 988e7eb to a6d86bc Compare October 21, 2025 21:46
@nhtruong nhtruong marked this pull request as ready for review October 21, 2025 21:57
@nhtruong
Copy link
Contributor Author

nhtruong commented Oct 24, 2025

@danny-avila would you mind taking a gander on this feature. With this we can not only move forward with Dynamic MCP Servers but this will also allow use to run cro jobs that only the leader node is supposed to handle. This feature works with both Redis and non Redis configurations. It's also cloud-deployment agnostic (whether it's rackbased, AWS, GCP...) and can scale to any number of nodes.

@nhtruong nhtruong force-pushed the leader-election branch 2 times, most recently from 32e727b to 2705a5a Compare October 28, 2025 22:01
@danny-avila
Copy link
Owner

Will review by the end of the week, thanks

@danny-avila
Copy link
Owner

reviewing this today, thanks for your contribution and patience!

@danny-avila danny-avila changed the base branch from main to dev October 30, 2025 21:06
@danny-avila danny-avila changed the title 👑 feat: Implement distributed leader election with Redis for multi-instance coordination 👑 feat: Distributed Leader Election with Redis for Multi-instance Coordination Oct 30, 2025
@danny-avila danny-avila merged commit 8f4705f into danny-avila:dev Oct 30, 2025
2 checks passed
@nhtruong nhtruong deleted the leader-election branch October 30, 2025 21:36
JustinBeaudry pushed a commit to Actual-Reality/LibreChat that referenced this pull request Nov 12, 2025
…rdination (danny-avila#10189)

* 🔧 refactor: Move GLOBAL_PREFIX_SEPARATOR to cacheConfig for consistency

* 👑 feat: Implement distributed leader election using Redis
Guiraud pushed a commit to Guiraud/LibreChat that referenced this pull request Nov 21, 2025
…rdination (danny-avila#10189)

* 🔧 refactor: Move GLOBAL_PREFIX_SEPARATOR to cacheConfig for consistency

* 👑 feat: Implement distributed leader election using Redis
patricksn3ll pushed a commit to patricksn3ll/LibreChat that referenced this pull request Dec 11, 2025
…rdination (danny-avila#10189)

* 🔧 refactor: Move GLOBAL_PREFIX_SEPARATOR to cacheConfig for consistency

* 👑 feat: Implement distributed leader election using Redis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants