-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Description
I tried this code, with absurd radices:
'0'.to_digit(0);
'0'.to_digit(1);
I expected to see this happen: panic
Instead, this happened: None Some(0)
Solution
This function was quite convoluted, which lead to unnecessary operations. Radix bounds checking was forgotten at the lower end.
OTOH radix bounds checking is often not necessary to repeat for every single digit, so I propose 2 variants, the consequence of which, see at the bottom:
pub const fn to_digit(self, radix: u32) -> Option<u32> {
assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");
self.to_digit_unchecked(radix)
}
/// ### Soundness
///
/// Callers of this function are responsible that this precondition is satisfied:
///
/// - The radix must be between 2 and 36 inclusive.
///
/// Failing that, the returned value may be weird and is not guaranteed to stay the same in future.
#[inline]
// semi-branchless variant
pub const fn to_digit_unchecked(selfie: char, radix: u32) -> Option<u32> {
let is_digit = (selfie <= '9') as u32;
let digit =
is_digit * // branchless if
// If not a digit, a number greater than radix will be created.
(selfie as u32).wrapping_sub('0' as u32)
+ (1 - is_digit) * // branchless else
// Force the 6th bit to be set to ensure ascii is lower case.
(selfie as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10);
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
// conventional variant
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
let digit =
match self {
// If not a digit, a number greater than radix will be created.
..='9' => (self as u32).wrapping_sub('0' as u32),
// Force the 6th bit to be set to ensure ascii is lower case.
_ => (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
};
/* Or if you prefer this style
if self <= '9' {
// If not a digit, a number greater than radix will be created.
(self as u32).wrapping_sub('0' as u32)
} else {
// Force the 6th bit to be set to ensure ascii is lower case.
(self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
};
*/
// FIXME: once then_some is const fn, use it here
if digit < radix { Some(digit) } else { None }
}
I have checked all callers of to_digit
, to see where it is already safe, or can be made safe, to switch to to_digit_unchecked
. Maybe each time a comment should be added to the place that guarantees a valid radix:
bounds already checked outside of loop, so can switch:
- library/core/src/num/mod.rs 2x
literal base in a variable, only valid values, so can switch:
- compiler/rustc_session/src/errors.rs 1x
- rustc_parse/src/lexer/mod.rs 1x
in a loop, should add bounds check and switch:
- library/core/src/net/parser.rs 2x
literal radices, where the compiler can hopefully eliminate the assert
s, so no need, but might do it to save compile time:
- compiler/rustc_lexer/src/unescape.rs 4x
- librustdoc/html/render/print_item.rs 6x
- rustc_parse_format/src/lib.rs 1x
- many test files
Activity
workingjubilee commentedon Nov 1, 2024
...PRs welcome? it's usually easiest, when you have a fix on-hand, to just have a fix to review.
workingjubilee commentedon Nov 1, 2024
That said, the documentation here:
suggests the correct results should probably be either Option A:
or Option B:
daniel-pfeiffer commentedon Nov 1, 2024
It conformed to panicking, when it said it would. However the radix bounds check is there to prevent wrong radices. Unary is a totally different scheme, which is not implemented by this function. All numbers Rust deals with in any base are in positional notation. There
0
is always the smallest digit. And incrementing the biggest digit gives10
. The smallest base for which this is possible is two.If Rust wants to special case tiny radices, it must actively do so and explain what’s going on! Otherwise they must be forbidden just like too big ones are. The current state of things is not consistent.
Night’s sleep has inspired me to offer a branchless variant. Not benchmarked, hoping that compiler specialists will know whether this is even more optimised.
My understanding was that a PR can’t just introduce a new function to the public API. And whether to switch all those callers to it, surely needs somebody’s prior approval.
I have never compiled rustc, and possibly don’t have enough disk-space. So if I were to issue a PR it would only be syntax checked.
scottmcm commentedon Nov 1, 2024
If you wish to propose
to_digit_unchecked
, the process is to open an ACP https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html#suitability-for-the-standard-librarylolbinarycat commentedon Nov 1, 2024
It's worth noting that
'1'.to_digit(1)
returnsNone
as expected. Personally, I think the current behavior matches what the documentation specifies, even if it isn't very useful. Inserting a new panic condition here could possibly cause subtle bugs in code that needs to not panic and takes the radix as a user input.also, your "Soundness" header should instead be called "Correctness". soundness in rust means whether some unsafe code allows safe code to cause UB (which can happen if a function or trait is not marked
unsafe
) when it needs to be. this function does not cause UB when its preconditions are violated, thus it is a correctness precondition, not a saftey precondition. also, it is a safe function, so it cannot have saftey preconditions.workingjubilee commentedon Nov 1, 2024
It's the opposite of a special case.
Panicking would be a special case.
workingjubilee commentedon Nov 1, 2024
According to the definition you have provided, then:
As we can see, when only 1 symbol is allowed, we can still have 0.
So you have successfully persuaded me there is not a problem.
Ask a silly question, get a silly answer. Rust is memory-safe, not foolproof.
to_digit
rust-lang/libs-team#475daniel-pfeiffer commentedon Nov 4, 2024
Unary works differently from binary and above. It only has the digit 1. A unary with digit 0 can’t work: it couldn’t count anything. Nor could a nullary system with what? No digits at all? Also a rubbish system.
There’s a good reason
from_str_radix
does both these checks. Andto_digit
is inconsistent with it.jhpratt commentedon Nov 4, 2024
As there's an ACP open, let's keep discussion there.