default value of prep-blocks-threads, defined as "lower of 4 or max threads", causes significant slowdown in non-checkpointed sync.
note: this has remained unchanged since implemented in 2015.
I noted consistent 45 and 70% speedup across 2 minute runs with --prep-blocks-threads=$(nproc) on an 8 thread aarch64 system, and a 256 thread x86_64 system, respectively.
suggestion: use max threads by default