Skip to content

Optimise and bug-fix to_digit #132428

@daniel-pfeiffer

Description

@daniel-pfeiffer

I tried this code, with absurd radices:

'0'.to_digit(0);
'0'.to_digit(1);

I expected to see this happen: panic

Instead, this happened: None Some(0)

Solution

This function was quite convoluted, which lead to unnecessary operations. Radix bounds checking was forgotten at the lower end.

OTOH radix bounds checking is often not necessary to repeat for every single digit, so I propose 2 variants, the consequence of which, see at the bottom:

pub const fn to_digit(self, radix: u32) -> Option<u32> {
    assert!(radix >= 2, "to_digit: radix is too low (minimum 2)");
    assert!(radix <= 36, "to_digit: radix is too high (maximum 36)");
    self.to_digit_unchecked(radix)
}

/// ### Soundness
///
/// Callers of this function are responsible that this precondition is satisfied:
///
/// - The radix must be between 2 and 36 inclusive.
///
/// Failing that, the returned value may be weird and is not guaranteed to stay the same in future.
#[inline]
// semi-branchless variant
pub const fn to_digit_unchecked(selfie: char, radix: u32) -> Option<u32> {
    let is_digit = (selfie <= '9') as u32;
    let digit =
        is_digit * // branchless if
            // If not a digit, a number greater than radix will be created.
            (selfie as u32).wrapping_sub('0' as u32)
        + (1 - is_digit) * // branchless else
            // Force the 6th bit to be set to ensure ascii is lower case.
            (selfie as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10);
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}
// conventional variant
pub const fn to_digit_unchecked(self, radix: u32) -> Option<u32> {
    let digit =
        match self {
            // If not a digit, a number greater than radix will be created.
            ..='9' => (self as u32).wrapping_sub('0' as u32),
            // Force the 6th bit to be set to ensure ascii is lower case.
            _ => (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
        };
	/* Or if you prefer this style
        if self <= '9' {
            // If not a digit, a number greater than radix will be created.
            (self as u32).wrapping_sub('0' as u32)
        } else {
            // Force the 6th bit to be set to ensure ascii is lower case.
            (self as u32 | 0b10_0000).wrapping_sub('a' as u32).saturating_add(10)
        };
	*/
    // FIXME: once then_some is const fn, use it here
    if digit < radix { Some(digit) } else { None }
}

I have checked all callers of to_digit, to see where it is already safe, or can be made safe, to switch to to_digit_unchecked. Maybe each time a comment should be added to the place that guarantees a valid radix:

bounds already checked outside of loop, so can switch:

  • library/core/src/num/mod.rs 2x

literal base in a variable, only valid values, so can switch:

  • compiler/rustc_session/src/errors.rs 1x
  • rustc_parse/src/lexer/mod.rs 1x

in a loop, should add bounds check and switch:

  • library/core/src/net/parser.rs 2x

literal radices, where the compiler can hopefully eliminate the asserts, so no need, but might do it to save compile time:

  • compiler/rustc_lexer/src/unescape.rs 4x
  • librustdoc/html/render/print_item.rs 6x
  • rustc_parse_format/src/lib.rs 1x
  • many test files

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Oct 31, 2024
workingjubilee

workingjubilee commented on Nov 1, 2024

@workingjubilee
Member

...PRs welcome? it's usually easiest, when you have a fix on-hand, to just have a fix to review.

workingjubilee

workingjubilee commented on Nov 1, 2024

@workingjubilee
Member

That said, the documentation here:

# Errors

Returns None if the char does not refer to a digit in the given radix.

# Panics

Panics if given a radix larger than 36.

suggests the correct results should probably be either Option A:

assert_eq!('0'.to_digit(0), None);
assert_eq!('1'.to_digit(1), Some(1)); // https://en.wikipedia.org/wiki/Unary_numeral_system

or Option B:

assert_eq!('0'.to_digit(0), None);
assert_eq!('0'.to_digit(1), None);
daniel-pfeiffer

daniel-pfeiffer commented on Nov 1, 2024

@daniel-pfeiffer
Author

It conformed to panicking, when it said it would. However the radix bounds check is there to prevent wrong radices. Unary is a totally different scheme, which is not implemented by this function. All numbers Rust deals with in any base are in positional notation. There 0 is always the smallest digit. And incrementing the biggest digit gives 10. The smallest base for which this is possible is two.

If Rust wants to special case tiny radices, it must actively do so and explain what’s going on! Otherwise they must be forbidden just like too big ones are. The current state of things is not consistent.

Night’s sleep has inspired me to offer a branchless variant. Not benchmarked, hoping that compiler specialists will know whether this is even more optimised.

My understanding was that a PR can’t just introduce a new function to the public API. And whether to switch all those callers to it, surely needs somebody’s prior approval.

I have never compiled rustc, and possibly don’t have enough disk-space. So if I were to issue a PR it would only be syntax checked.

scottmcm

scottmcm commented on Nov 1, 2024

@scottmcm
Member

If you wish to propose to_digit_unchecked, the process is to open an ACP https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html#suitability-for-the-standard-library

added
T-libsRelevant to the library team, which will review and decide on the PR/issue.
on Nov 1, 2024
lolbinarycat

lolbinarycat commented on Nov 1, 2024

@lolbinarycat
Contributor

It's worth noting that '1'.to_digit(1) returns None as expected. Personally, I think the current behavior matches what the documentation specifies, even if it isn't very useful. Inserting a new panic condition here could possibly cause subtle bugs in code that needs to not panic and takes the radix as a user input.

also, your "Soundness" header should instead be called "Correctness". soundness in rust means whether some unsafe code allows safe code to cause UB (which can happen if a function or trait is not marked unsafe) when it needs to be. this function does not cause UB when its preconditions are violated, thus it is a correctness precondition, not a saftey precondition. also, it is a safe function, so it cannot have saftey preconditions.

workingjubilee

workingjubilee commented on Nov 1, 2024

@workingjubilee
Member

If Rust wants to special case tiny radices,

It's the opposite of a special case.

Panicking would be a special case.

workingjubilee

workingjubilee commented on Nov 1, 2024

@workingjubilee
Member

All numbers Rust deals with in any base are in positional notation. There 0 is always the smallest digit. And incrementing the biggest digit gives 10.

According to the definition you have provided, then:

  • when the radix given is 1, the provided answer for 0 is 0.
  • incrementing the biggest digit would give 10, but that is not allowed, so it returns None.

The smallest base for which this is possible is two.

As we can see, when only 1 symbol is allowed, we can still have 0.

So you have successfully persuaded me there is not a problem.

Ask a silly question, get a silly answer. Rust is memory-safe, not foolproof.

added
C-feature-requestCategory: A feature request, i.e: not implemented / a PR.
and removed
C-bugCategory: This is a bug.
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on Nov 2, 2024
daniel-pfeiffer

daniel-pfeiffer commented on Nov 4, 2024

@daniel-pfeiffer
Author

As we can see, when only 1 symbol is allowed, we can still have 0.

Unary works differently from binary and above. It only has the digit 1. A unary with digit 0 can’t work: it couldn’t count anything. Nor could a nullary system with what? No digits at all? Also a rubbish system.

There’s a good reason from_str_radix does both these checks. And to_digit is inconsistent with it.

jhpratt

jhpratt commented on Nov 4, 2024

@jhpratt
Member

As there's an ACP open, let's keep discussion there.

locked as resolved and limited conversation to collaborators on Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    C-feature-requestCategory: A feature request, i.e: not implemented / a PR.T-libsRelevant to the library team, which will review and decide on the PR/issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jhpratt@daniel-pfeiffer@saethlin@scottmcm@lolbinarycat

        Issue actions

          Optimise and bug-fix `to_digit` · Issue #132428 · rust-lang/rust