Skip to content

Sometimes 'Σ' is not handled correctly in str.to_lowercase() #124714

Closed
@Marcondiro

Description

@Marcondiro
Contributor

Hello,
Due to the optimization introduced with #97046, the to_lowercase conversion of some str containing the Σ char is incorrect.

Simple example:

fn main(){
    println!("{}", "aΣ".to_lowercase());
    println!("{}", "abcdefghijklmnopΣ".to_lowercase());
}

output:

aς
abcdefghijklmnopσ

The first conversion is correct while the second is not and should be abcdefghijklmnopς.

More generally, this happens when 'Σ' follows a 2 * USIZE_SIZE * K number of chars

use std::mem;

const USIZE_SIZE_BY_2: usize = 2 * mem::size_of::<usize>();
const K: usize = 1;

fn main() {
    let bug_string = "a".repeat(USIZE_SIZE_BY_2 * K) + "Σ";

    println!("{}", bug_string.to_lowercase());
}

Activity

added
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on May 4, 2024
Marcondiro

Marcondiro commented on May 4, 2024

@Marcondiro
ContributorAuthor

@rustbot claim

added
T-libsRelevant to the library team, which will review and decide on the PR/issue.
S-has-mcveStatus: A Minimal Complete and Verifiable Example has been found for this issue
and removed
needs-triageThis issue may need triage. Remove it if it has been sufficiently triaged.
on May 4, 2024
jhorstmann

jhorstmann commented on May 5, 2024

@jhorstmann
Contributor

@Marcondiro I tried optimizing to_lowercase in #123778. In that PR the whole ascii prefix is converted by an optimized loop, which probably makes the issue easier to trigger and maybe also easier to fix. Let me know if you'd like me to look into this.

added a commit that references this issue on May 5, 2024

fix rust-lang#124714 str.to_lowercase sigma handling

50d7b9d
added 3 commits that reference this issue on May 6, 2024

fix rust-lang#124714 str.to_lowercase sigma handling

6db331b

fix rust-lang#124714 str.to_lowercase sigma handling

a2c62a6

fix rust-lang#124714 str.to_lowercase sigma handling

bbdf972

4 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

A-UnicodeArea: UnicodeC-bugCategory: This is a bug.S-has-mcveStatus: A Minimal Complete and Verifiable Example has been found for this issueT-libsRelevant to the library team, which will review and decide on the PR/issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @jhorstmann@fmease@jieyouxu@Marcondiro@rustbot

      Issue actions

        Sometimes 'Σ' is not handled correctly in str.to_lowercase() · Issue #124714 · rust-lang/rust