Description
As part of the continuous conversation we've been having on Slack (and also spread in some of the nodejs/build
issues); We have long-term goals for improving the reliability of our Distribution Assets (aka /home/dist
) on the DigitalOcean Server.
The main goal is to uphold a reliable way to serve our assets (binaries, docs, metrics, etc) to the public.
What's the issue?
As mentioned in a few issues such as (#3424 and #3410) and on our March 17th incident (on https://nodejs.org/ko/blog/announcements/node-js-march-17-incident) and discussed over many other places; Our DigitalOcean server is unable to serve all the traffic it gets.
Of course, a few things are set in place, such as Cloudflare-caching, so that, in theory, all requests are served by Cloudflare after the initial load.
This has been proven inefficient due to numerous factors, for example, cache purges. (Even if the cache purges were tailored to only the affected paths) It is still a risky approach that creates a gigantic load on our DigitalOcean server. The same server is stored for all Node.js Binaries and numerous other vital assets to date.
Not to mention that even for a short period, the server cannot withhold the immense traffic that goes through it.
Meaning: In the best scenario, this server should never enter a stressful situation.
What's the Plan?
Champions
Proof of Concept Repository
After numerous discussions with the Build Team, the OpenJS Foundation, and Cloudflare, we've concluded: The DigitalOcean server should not serve these files at best.
Enter the solution: Cloudflare R2
The idea is that all requests that do not go to Vercel (aka selected paths such as /download
, /dist
, /docs
, /api
) will go through a Cloudflare Worker.
This worker is responsible for:
- Serving content from a R2 Bucket
- The R2 Bucket has the contents of
/home/dist
from the DO server synced- Meaning that the root (
/
) of the R2 bucket is the contents of/home/dist
- Meaning that the root (
- The sync of file is done through a script/daemon/or something else sitting inside the DO server.
- It should be reactive and only do additions/removals/updates based on
fs
changes (so it is incremental)
- It should be reactive and only do additions/removals/updates based on
- The R2 Bucket has the contents of
- Mapping Requests
- Since the contents is the same from the DO server, we should respect the same paths originally created on the
NGINX nodejs.org
config file. - For example, requests to the following places go mapped:
/dist
goes to/nodejs/release
(originally on DO/home/dist/nodejs/release
)/download
goes to/nodejs
(originally on DO/home/dist/nodejs
)/docs
goes to/nodejs/docs
(originally on DO/home/dist/nodejs/docs
)/api
goes to/nodejs/docs/latest/api
(originally on DO/home/dist/nodejs/docs/latest/api
)/metrics
goes to/metrics
(originally on DO/home/dist/metrics
)
- These mappings ensure that requests are respected the original way intended on NGINX
- Since the contents is the same from the DO server, we should respect the same paths originally created on the
- Serving Directory Listings
- Caching Access (setting cache headers)
- Serving 404's
The Worker will never make a single request to the DigitalOcean origin, because it should always be in sync with whatever is in DigitalOcean.
This also means that:
- We can disable our Load Balancer configuration on Cloudflare
- In case of emergencies, we can disable the Worker and re-enable the Load Balancer
- Cloudflare will never make requests to DigitalOcean itself
- DigitalOcean Server is still the source of truth
- Load on the Server will be close to 0
The Long Term
This solution is also long-term proof, But in the future we could create ways that Jenkins uploads directly to R2, for example, or other shenanigans.