-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
As the issue title indicates, I have discovered that, in some cases, sorting a slice via slice::sort
erroneously raises SIGILL (illegal instruction). Below is a minimal test case for the observed behavior.
use std::fmt;
#[derive(Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
struct S(bool, [u8; 999_999]);
impl fmt::Debug for S {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
fmt::Debug::fmt(&self.0, f)
}
}
fn main() {
let n = 101;
let mut objs = Vec::with_capacity(n);
let mut on = false;
for _ in 0..n {
objs.push(S(on, [0; 999_999]));
on = !on;
}
println!("Done adding elements.");
println!("Objects: {:?}", objs);
objs.sort();
println!("Objects: {:?}", objs);
}
If you run this code in a debugger, you will see that the SIGILL comes from the call to core::intrinsics::abort
here. That is, something is causing scratch.len()
to be less than len
when it shouldn't be.
I think I have traced the cause of the problem to how min_good_run_len
is calculated here. If len <= MIN_SQRT_RUN_LEN * MIN_SQRT_RUN_LEN
holds true and len - len / 2
(which is really just ceil(len / 2)
) happens to be <= MIN_SQRT_RUN_LEN
, then min_good_run_len
is given the value len - len / 2
(which, again, is just ceil(len / 2)
). When the function create_run
in the same file cannot find an ascending or descending run of at least min_good_run_len
elements, it extracts an unsorted run that extends for min_good_run_len
elements or until the end of the slice/array (whichever is earlier). Later, the function logical_merge
in the same file will sort an unsorted run if it and the run with which it is to be merged do not together fit in scratch
.
scratch.len()
is at least floor(len / 2)
, and floor(len / 2)
is less than ceil(len / 2)
when len
is an odd integer, so a problem arises when scratch.len()
is equal to floor(len / 2)
, min_good_run_len
is equal to ceil(len / 2)
, and len
is odd, because, then, logical_merge
might try to sort an unsorted run that cannot fit in scratch
. If my math is correct, this is exactly the situation occurring in my minimal test case.
Replacing len - len / 2
with len / 2
in the calculation of min_good_run_len
would therefore fix the issue, if I am not mistaken. I do not know why the author(s) of the driftsort chose len - len / 2
instead of len / 2
. Neither the commit (in the original driftsort GitHub repository) where it first appeared (Voultapher/driftsort@dbe1d24) nor the associated pull request (Voultapher/driftsort#39) appears to give an explanation.
Meta
rustc --version --verbose
:
rustc 1.86.0-nightly (f7cc13af8 2025-01-25)
binary: rustc
commit-hash: f7cc13af822fe68c64fec0b05aa9dd1412451f7c
commit-date: 2025-01-25
host: x86_64-unknown-linux-gnu
release: 1.86.0-nightly
LLVM version: 19.1.7
Activity
orlp commentedon Jan 26, 2025
I believe the @jonathan-gruber-jg's analysis to be accurate as to the cause of the bug. Because of the (abort-based) assert it hits I don't believe there's a security issue, only a potential denial-of-service issue.
The reason this only triggers with a large type is because up until 8MB we always allocate a scratch space for
n
elements rather than then / 2
we do for very large arrays. However we don't have a hard switch at 8MB fromn
ton/2
, the formula is more complicated (paraphrased):The only way to trigger this bug is if the
len / 2
term dominates, so we must havelen / 2 >= 8_000_000 / size_of::<T>()
and thus we see thatsize_of::<T>() >= 8_000_000 * 2 / len
(modulo rounding off-by-ones). However we must also have thatlen - len / 2 <= MIN_SQRT_RUN_LEN == 64
, thus we concludelen == 127
is the largest array possible that can trigger this bug, putting a minimum bound onsize_of::<T>()
of 125000 bytes. Empirically I see the threshold is at125000
bytes exactly (with125001
being the smallest type that fails). This is thus the minimal reproducer:If you try to make
n
any bigger in an attempt to reducesize_of::<T>() < 125001
you will no longer hit the bug. Thus while I agree this is an issue that should be fixed I don't think there is any realistic denial-of-service attack surface here as anyone sorting objects which are 125KB+ large (as insize_of::<T>()
) is already denying their own service (and/or already getting dangerously close to stack overflows).The explanation was done in private communication at the time. I proposed to switch to
len - len / 2
:I still believe this justification to be sound so I don't believe the
min_good_run_len
computation should be changed, rather, we should ensure the scratch always has length at leastlen - len / 2
instead of onlylen / 2
.cuviper commentedon Jan 27, 2025
For the record, this did go through a
security@rust-lang.org
report, and we agreed that the conditions for denial-of-service were unlikely enough that we can treat this as an open issue, rather than a privately-coordinated security issue.uellenberg commentedon Jan 27, 2025
@rustbot claim
Fix off-by-one error causing driftsort to crash
Fix off-by-one error causing driftsort to crash
Fix off-by-one error causing driftsort to crash
Fix off-by-one error causing driftsort to crash
Rollup merge of rust-lang#136163 - uellenberg:driftsort-off-by-one, r…
Rollup merge of rust-lang#136163 - uellenberg:driftsort-off-by-one, r…
21 remaining items