Skip to content

Panicks at match arm that starts with non-ascii char #4868

Closed
@toslunar

Description

@toslunar
enum NonAscii {
    Abcd,
    Éfgh,
}

use NonAscii::*;

fn f(x: NonAscii) -> bool {
    match x {
        Éfgh => true,
        _ => false,
    }
}

fn main() {
    dbg!(f(Abcd));
}

Rustfmt on Playground (1.4.37-nightly (2021-06-12 24bdc6d)) fails to format the above code:

thread 'main' panicked at 'byte index 109 is not a char boundary; it is inside 'É' (bytes 108..110) of `enum NonAscii {
    Abcd,
    Éfgh,
}

use NonAscii::*;

fn f(x: NonAscii) -> bool {
    match x {
        Éfgh => true,
        _ => false,
    }
}

fn main() {
    dbg!(f(Abcd));
}
`', src/tools/rustfmt/src/visitor.rs:47:15
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Activity

NichtsHsu

NichtsHsu commented on Jun 18, 2021

@NichtsHsu
added
a-matchesmatch arms, patterns, blocks, etc
bugPanic, non-idempotency, invalid code, etc.
on Jun 21, 2021
calebcartwright

calebcartwright commented on Jun 21, 2021

@calebcartwright
Member

Haven't had a chance to check but assume that there's no issues with the pattern span, and that the most likely source is where we're trying to figure out if the pattern has a leading pipe and the corresponding byte position:

rustfmt/src/matches.rs

Lines 162 to 174 in 6495024

/// Collect a byte position of the beginning `|` for each arm, if available.
fn collect_beginning_verts(
context: &RewriteContext<'_>,
arms: &[ast::Arm],
) -> Vec<Option<BytePos>> {
arms.iter()
.map(|a| {
context
.snippet_provider
.opt_span_before(mk_sp_lo_plus_one(a.pat.span.lo()), "|")
})
.collect()
}

crlf0710

crlf0710 commented on Jun 21, 2021

@crlf0710
Member

Basically the mk_sp_lo_plus_one function is incorrect, since utf-8 is a multibyte encoding format.

calebcartwright

calebcartwright commented on Jun 22, 2021

@calebcartwright
Member

Basically the mk_sp_lo_plus_one function is incorrect, since utf-8 is a multibyte encoding format.

I'm not sure about that. That function is correctly generating the expected/desired span, but we should avoid using that span-baed approach here to see if the pattern starts with a pipe given this case. If generating span lo/hi's with bytepos offsets is incorrect then we have much bigger problems 😄

I realize this is likely an annoying issue for folks so will take care of fixing this one myself

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

a-matchesmatch arms, patterns, blocks, etcbugPanic, non-idempotency, invalid code, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @crlf0710@toslunar@calebcartwright@NichtsHsu

    Issue actions

      Panicks at match arm that starts with non-ascii char · Issue #4868 · rust-lang/rustfmt