-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
tl;dr: Introduce a simple mechanism for limiting parallelism automatically in Cargo, to avoid consuming all system resources during the compilation.
Problem
Cargo by default uses all cores (std::thread::available_parallelism) and spawns off rustc or build scripts onto each core. This is not an issue when compiling on a decent machine. When working on low-end machines or large-scale codebase, developers often encounter issue like extremely high CPU loads or out-of-memory errors.
To solve these problem, developers can set --jobs from command line, or build.jobs in .cargo/config.toml to control the maximum parallelism Cargo can use. This is not ideal because
build.jobsis bound to number of core in use. It is not immediately correlated with memory usage. Executing parallel builds might lead to out of memory before any CPU throttling happens, especially when several linker invocations happens.build.jobsassigns core fairly to each unit of work (i.e. a crate build"). However, some crate builds might consume more computing resources. If those crate builds are bottlenecks of the entire compilation, we might want to throw more resources to complete them to unblock other crate builds.- Developers need to set
build.jobsexplicitly to control the parallelism. However, it's often a long trial-and-error to figure out a proper value. The value also varies in different environments. Not really user friendly. - Developers might want a full control of every dependency build.
build.jobsis too coarse-grained.
An "ideal" approach (but not now)
There are a couple of existing proposals trying to improve the situation. Some of them want to define a weight on a certain job, or tag jobs into a group. With weights and tags, job scheduler understands whether it should allocate a job. This is pretty much the ideal solution, as it maximizes the control of parallelism for developers, and the system could be extend to the job scheduling optimization.
However, such a system requires developers to fully understand the entire compilation of their projects. For now, the data is either missing or hard to get from Cargo. To incrementally build the system, there are prerequisites:
- Cargo can monitor the resource usage of the system and each unit of work during a build.
- Cargo can persist the resource usage of each unit of work for each build.
Start small
We should start small, and focus on monitoring the resource usage, additionally limiting the parallelism when the usage exceeds a threshold.
Some options we can do:
- Assign the maximum amount of resources that Cargo can use. This is how
build.jobsworks now. We might need an equivalent for memory usage. Something like[build.limit] local-memory = "3GiB" # or "95%" or "100% - 200MiB"
- Set a system threshold. Cargo won't allocate any new job and wait for the entire system usage going down, even when the usage of Cargo itself is still under the assigned maximum.
[build.limit] system = "3GiB" # or "95%" or "100% - 200MiB" cpu = "100%"
To minimize the impact of bad data points, these metrics will be sampled and averaged out within a period of time.
Instead of "usage", we can also leverage the concept "load average" from Unix-like, which might make more sense to manage computing resource loads.
I entirely don't know which one we want, or both, or none.
Library to use
procfs— used by wider Rust web-dev community, via promethues and other metrics crates.sysinfo— another popular crate for inspecting system info.
Both of then introduce an excessive amount of code Cargo doesn't need at this moment.
Alternatively, we can use syscall lib directly to get these info.
Prior arts
-
Bazel
--jobs--local_{ram,cpu}_resourcesto assign resources a build can use
-
Buck
--jobslink_weightto config how many job a link job consumes.
-
Cabel
--jobs- Got the same linker invocation issue Add option to limit number of concurrent calls to linker when building with -j haskell/cabal#1529.
-
CMake
-jto set max number of concurrent processes
-
GitHub Actions
- has
concurrency.group
- has
-
Go
go build -plimits the number of programs, such as build commands or test binaries, that can be run in parallel.GOMAXPROCSto limit the number of OS threads that can execute user-level Go code simultaneously.
-
Gradle
--max-workers— like--jobs- Has a
SharedResourceLeaseRegistryfor registering a resource with its maximum lease numbers. Like a semaphore. - Parallelism can be configured per-project on demand.
-
make
-jto set max number of concurrent jobs--max-loadto limit the start of a new job if load average goes above the value- Read Parallel for more
-
Ninja
- has a pool concept that user can assign some stage of build to a pool with more restricted parallelism rules.
-
Nix
-
sbt
tasksare tagged, and each tag get a default weight of resource restriction.
Related issues
There are more issues regaring scheduling but I dont want to link to them here. These are issue of people trying to tell Cargo not to be that greedy.
- Support cargo build --load-average / -l like GNU make #7480
- Hint mechanism to require more "slots" to build a crate #8405
- Avoid cargo throttling system with too many tasks on slower CPUs #8556
- Allow restricting the number of parallel linker invocations #9157
- Introduce 'nice' value under cargo.toml -> [build] #9250
- Cargo hits OOM when building many examples #11707
- cargo test renders device unresponsive due to MSVC linker RAM usage #12916
- Memory leak/spike during doc-tests #14190
And sorry I opened a new issue instead. Feel free to close and move to any existing one.
Activity
epage commentedon Nov 3, 2023
#9250 is an interesting alternative for the CPU load aspect. I've not done enough with nice to know how cross platform the concept is or if there are restrictions that might get in the way.
In general, with all of the security and docker-like technologies out these days, I wonder if there is more we can delegate to the operating system for this which would likely reduce complexity and overhead within cargo.
epage commentedon Nov 3, 2023
On the surface, percentages seem nice because you don't have to worry about the exact configuration of the local system. However, 90% of 64GB is a lot more usable of a system than 90% of 8GB. I feel like what will be most useful is "all except". We covered this with
--jobsby allowing negative numbers. We could do similar here where-3GBmeans "all but 3GB"epage commentedon Nov 3, 2023
The meaning of such an average dramatically changes whenever a job finishes and a new one starts, especially if there are jobs or categories of jobs (e.g. linking) with dramatically different characteristics.
epage commentedon Nov 3, 2023
With the parallel frontend rearing its head again, we should probably consider how that affects this.
the8472 commentedon Nov 4, 2023
On linux specifically it might be better to monitor pressure rather than utilization. The downside is that that's even less portable.
31 remaining items
emlautarom1 commentedon Aug 26, 2025
I'm currently facing a similar issue with a 7950x3d and 32 GB of RAM + 16 GB of swap: building a large project forces me to use
-j4, otherwise the system becomes extremely unstable to the point where I have to manually unplug the PC.the8472 commentedon Aug 26, 2025
They kill the process. Few things do that, so if we see the child terminated by a SIGKILL then OOM or user action are a good guess. It can have other causes, but it might be good enough for cargo to print a hint.
ell1e commentedon Aug 27, 2025
I think earlyoom uses SIGTERM first. However, you might not want to wait for earlyoom, it might stop something else the user cares about. Or like me, the user might stop the build earlier.
Sieabah commentedon Sep 1, 2025
Has anyone considered the ugly solution that if you query the top processes and they're mostly pids you own (as part of the build) and when you read 100% CPU, can't you start terminating the processes? Effectively shedding the load? You don't necessarily need any of this complicated (at 100% cpu) IPC, you just need to prevent the system from killing itself.
If not a reactive strategy then something about ramping-up the build that attempts to build a batch of targets and measure the effective "load" as it compiles. You can increase the number of batches until the desired "load" is met. My angle is that attempting to measure build complexity is a non-starter due to how many "um actually" edge cases that will be discovered.
The goal is to create a throttled build so lower-end systems can operate after running
cargo build. I personally don't care and don't think anyone else should care if the solution kills part of the build and I wasted cycles recompiling a batch. I end wasting the entire build if my system becomes unresponsive. In fact, I lose a lot more when the entire system dies, which makes any solution here better than nothing.Probably talking into a void, but figured I'd give my thoughts since this effects my CI environment more than anything. I shouldn't have to spin up an over-provisioned dedicated metal server exclusively for cargo so that I can compile something. I should be able to compile on any system and it's just a matter of time before it's done.
dpc commentedon Sep 2, 2025
Many downsides comparing to my existing proposal.
Querying top processes and even more so cpu usage is harder, platform dependent, etc. In certain scenarios it might require calls that no permitted (e.g. some security sandbox). Then retroactively killing processes might happen too late, and also requires polling. Lastly, 100% CPU usage is normal and desired. It's only running out of memory that is the issue.
ell1e commentedon Sep 2, 2025
So would projects need to manually set the required memory based on actual tests? For what it's worth, 8GB still seems like a fairly large amount.
dpc commentedon Sep 2, 2025
I guess cargo could use a default value. Realistically we can expect a small to medium side project to require a certain amount of memory per job. I don't know ... 1 or 2 GB? So then only larger projects would need to override it.
Though I don't find it as a big of an issue because typically amount of CPUs already limits the memory usage. I don't think configurations e.g. with 8 CPU cores but only 4GB of memory are all that common.
ell1e commentedon Sep 2, 2025
My rate of being able to build rust projects without crashing is lower than 50% so far.
Sieabah commentedon Sep 10, 2025
@dpc You make a fair point about the complexities of process monitoring. My concern, however, isn't about finding a perfect solution, but a practical one for resource-constrained environments like CI runners or smaller cloud instances (e.g., 1 CPU, 2-4GB RAM). I shouldn't have to provision a dedicated metal instance in order to compile a rust project. Coming from NodeJs/NPM land. this will only get worse as more people pull in the simple dependencies from crates.io and "trivial" projects need to compile some 300-400 crates. The zed editor pulls in 1900 crates.
I disagree that targeting 100% CPU usage is always desirable. On a low-spec machine, saturating the CPU can starve the OS scheduler and I/O, leading to resource contention that actually slows down the build and can make the entire system unresponsive. I'm not sure the problem is only avoiding OOM errors; it's preventing system-wide lockups due to CPU pressure long before memory becomes an issue.
This aggressive default behavior creates a poor user experience. I've had my IDEs freeze my entire computer (16 core, 4GHz & 128GB of memory) when a background build kicks off, sometimes to the point of needing a reboot. Any default build strategy that can cause scheduler thrashing or force the OS to kill processes for stability is fundamentally problematic. We're careful about always setting a threshold for channel buffer size rather than unbounded, why should the way cargo treat CPU/Memory be different in this scenario?
That's why I proposed a more adaptive, heuristic-based approach. Cargo could start with a small workload, measure the system's response, and then incrementally increase parallelism to find a sustainable throughput, always leaving a small buffer for the OS. A scrappy solution that ensures a build completes reliably is far better than a theoretically faster one that risks taking the whole system down with it. I know the platform dependent solution isn't a great start, but anything is better here. Even if it only supports linux (and debian/centos/ubuntu/rhel/etc).
ell1e commentedon Sep 10, 2025
I have never seen the same level unresponsive even on my Allwinner embedded CPU device if some processes go 100% cpu, compared to out of memory. So while neither is optimal on a slow machine, I think there's a difference (or is it just me?).
dpc commentedon Sep 10, 2025
This is simply not true. CPUs are meant to work 100% of the time and nothing bad is happening because of it.
What you will nowadays consider "low-spec" is 10x-100x faster than what devs have been using 15 or 30 years ago, and back then 50Mhz Pentium CPU was also running compilers at full throttle and everything worked just fine. On a multi-core systems build systems other tools have been defaulting to running the amount of CPU-intense jobs equal to number of CPUs of the system forever.
The "CPU usage" is a generally useless metric. The issues you're experiencing are typically caused for one of two reasons:
CPU time is oversubscribed, which you will see as "load average" increasing over the number of CPUs in the system. Basically way too many jobs trying to get ahold of the CPU to run them. This also causes the CPUs to switch between jobs which thrashes caches (dcache, icache, TLB). This condition often degrades quite gracefully though, so mouse and keyboard might get some lag, but the system usually tends to stay somewhat usable.
If you system has swap and your build runs out of memory everything grinds to a halt because the OS has to constantly swap the memory in and out of storage, so the CPU's memory bandwidth gets capped to the IO bandwidth of the disk IO. (BTW. This will also typically cause the condition 1. to happen as well). Unlike the first case, this one is typically terrible, and the system oftentime becomes near unresponsive. I generally don't enable swap other than zswap on my machines, because I rather have things OOM and crash quickly, than look at my machines swapping while I'm unable to ssh or type in some kill command to make it fail. However enabling swap on low-spec (low-memory) machines is often necessary, just to have to get past some higher-memory usage scenarios or even browse modern websites. So low-memory systems get double-whammed.
In principle you want to avoid both conditions but
cargoalready defaults to running number of jobs equal to the number of the CPUs. So unless something else that is heavy is running already, 1. is not an issue. The dreadful 2. is currently not being taken care of, and my previous proposal is meant to precisely meant to avoid that in a relatively simple manner.dpc commentedon Sep 10, 2025
BTW. Not starting new jobs if loadavg is above certain (CPU core count adjusted) threshold is often used in all sorts of tooling. example
--loadfor gnu parallel. I've used it plenty of times, but it only really works well in long running scenarios with a lot of jobs with somewhat uniform distribution - for me it's typically CI jobs running tons of tests, etc.It would not work well in Rust because the problem is typically running out of memory and that tends to happen suddenly at the end of the build when heavy top-level compilation and linking starts. As running out of memory happens suddenly, while loadavg is keeping track of historical (15s, 1m, 5m) stats of the usage. For this reason Gnu Parallel has also
--memfreebut even that would not work, because it's also looking back. Just because the machine has 8GB of free memory right now, doesn't mean that cargo will not start 2 linking jobs at the same time and each will need 5GB of memory, and the machine will grind to halt. Because of that the memory ideally needs to be accounted for in advance. cargo needs to know that "we probably need 10GB of memory to run these jobs in parallel, or we need to run them serially". And the only reliable way to get that that I can think of is to have project developers write it out somewhere in some file after they measure it in practice.ell1e commentedon Sep 10, 2025
I guess cargo could display a warning if a measure wasn't specified or the value was somehow determined to be possibly outdated, that might get people to do it. Unprompted I have my doubts, given job developers don't even seem to think about how the amount of parallel jobs leads to out of memory problems, which is a lower hanging fruit than measuring things.
dpc commentedon Sep 10, 2025
It took us 3 years, 100k LoC and a handful of chunky deps to get to the point we need around 8G per job. Some default value (3G per job?) would likely eliminate the problem altogether for all projects by default.
ell1e commentedon Sep 18, 2025
100K lines of code doesn't seem that rare for a rust project, so I feel like 3GB per job might be a little low. But I suppose it's easier to talk about defaults once the setting exists, is in practical use, and the impact can be seen hands-on.