Lexer: check in `advance_token` to avoid regard spare `##` as `GardedStrPrefix` #141028

xizheyin · 2025-05-15T11:00:38Z

Fixes #140618

I performed additional checks to avoid treating the two ## after r#‘xxx’#### as GardedStrPrefix.

GardedStrPrefix is introduced in 321a5db.

r? compiler

xizheyin · 2025-05-15T11:01:12Z

@rustbot author

rustbot · 2025-05-15T11:01:17Z

Reminder, once the PR becomes ready for a review, use @rustbot ready.

xizheyin · 2025-05-15T13:05:21Z

@rustbot ready

xizheyin · 2025-05-31T04:21:50Z

r? compiler

wesleywiser · 2025-06-06T16:05:56Z

r? compiler

jdonszelmann · 2025-06-06T21:50:14Z

heya, you now unconditionally look at the previous character every step of tokenization. I'm admittedly somewhat new to review rotation, and don't have a good intuition to how much lexing matters in perf. My expectation is 0 but I'm interested in the outcome of a perf run because of that. Will be reviewing more, but initial pass looks reasonable @bors try @rust-timer queue

bors · 2025-06-06T21:51:29Z

⌛ Trying commit 3fe5df4 with merge 606582d...

Lexer: check in `advance_token` to avoid regard spare `##` as `GardedStrPrefix` Fixes #140618 I performed additional checks to avoid treating the two `##` after `r#‘xxx’####` as `GardedStrPrefix`. `GardedStrPrefix` is introduced in <321a5db>. r? compiler

bors · 2025-06-07T00:24:08Z

☀️ Try build successful - checks-actions
Build commit: 606582d (606582da13e1b4b4ec1c1e80afd6ce9cb2755789)

rust-timer · 2025-06-07T03:17:07Z

Finished benchmarking commit (606582d): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results (primary 5.4%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	5.4%	[5.4%, 5.4%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	5.4%	[5.4%, 5.4%]	1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results (secondary 0.0%)

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	0.0%	[0.0%, 0.0%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Bootstrap: 752.053s -> 750.495s (-0.21%)
Artifact size: 372.47 MiB -> 372.53 MiB (0.02%)

Signed-off-by: xizheyin <[email protected]>

…fix` Signed-off-by: xizheyin <[email protected]>

xizheyin · 2025-06-07T06:02:17Z

This was based on a very old base, so I rebased it to the new master branch.

I don't think this will have much of an impact on performance, however I'm not quite sure why max-rss is increasing by 5%, is it compared to the latest master branch? Maybe we should rerun @rust-timer queue. But I have no sufficient permissions...

xizheyin · 2025-06-07T06:12:46Z

I'm curious as to how these permissions were obtained. Because a couple of previous reviewers said r=me after resolving something, yet I don't have permissions to do that either.

jdonszelmann · 2025-06-07T07:54:43Z

both reviews and perf runs can only be done by team members. Sometimes a reviewer says r=me which effectively means any team member can merge it, the review has been done. For you it means, that's the last problem with your PR, after that someone (maybe them) will merge. Sometimes a reviewer can delegate these powers to non team members, but you can't do this yourself.

jdonszelmann · 2025-06-07T08:01:19Z

compiler/rustc_lexer/src/lib.rs

@@ -371,6 +371,7 @@ pub fn is_ident(string: &str) -> bool {
 impl Cursor<'_> {
    /// Parses a token from the input string.
    pub fn advance_token(&mut self) -> Token {
+        let pre_char = self.prev();


self.prev() is only for debug builds (read its documentation)

jdonszelmann · 2025-06-07T08:05:30Z

In general, it feels to me like these checks should be part of raw_or_double_quoted_string. i.e., once a string has been parsed, and we see more hashtags, error on that

xizheyin · 2025-06-07T09:21:57Z

it will parse the last # of #"foo"### as GuardedStrPrefix, which is error.

I think this recognition should be corrected in advance_token. But I also don't know why prev() is only used in debug, do we need to remove that debug attribute or find another way?

jdonszelmann · 2025-06-07T10:49:52Z

No it's there for a reason, if you look prev is only used in debug assertions. We don't really want to look behind, so you likely want to find another way to do this

xizheyin · 2025-06-07T11:07:24Z

Okay, I'll re-investigate when I get some free time.

rustbot assigned BoxyUwU May 15, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 15, 2025

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 15, 2025

This comment has been minimized.

Sign in to view

xizheyin force-pushed the issue-140618 branch 2 times, most recently from 1ab58fc to 3fe5df4 Compare May 15, 2025 13:02

xizheyin changed the title ~~Lexer: check ### to avoid regard ## as GardedStrPrefix~~ Lexer: check in advance_token to avoid regard spare ## as GardedStrPrefix May 15, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels May 15, 2025

rustbot assigned wesleywiser and unassigned BoxyUwU May 31, 2025

rustbot assigned jdonszelmann and unassigned wesleywiser Jun 6, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 6, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 7, 2025

Add ui test reverved-guarded-string-too-many-terminators-issue-140618

45cb7c6

Signed-off-by: xizheyin <[email protected]>

Lexer: check in advance_token to avoid regard ## as `GardedStrPre…

a09b825

…fix` Signed-off-by: xizheyin <[email protected]>

xizheyin force-pushed the issue-140618 branch from 3fe5df4 to a09b825 Compare June 7, 2025 05:58

This comment was marked as off-topic.

Sign in to view

jdonszelmann reviewed Jun 7, 2025

View reviewed changes

Lexer: check in advance_token to avoid regard spare ## as GardedStrPrefix #141028

Are you sure you want to change the base?

Lexer: check in advance_token to avoid regard spare ## as GardedStrPrefix #141028

Uh oh!

Conversation

xizheyin commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xizheyin commented May 15, 2025

Uh oh!

rustbot commented May 15, 2025

Uh oh!

This comment has been minimized.

xizheyin commented May 15, 2025

Uh oh!

xizheyin commented May 31, 2025

Uh oh!

wesleywiser commented Jun 6, 2025

Uh oh!

jdonszelmann commented Jun 6, 2025

Uh oh!

This comment has been minimized.

bors commented Jun 6, 2025

Uh oh!

bors commented Jun 7, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Jun 7, 2025

Overall result: no relevant changes - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

xizheyin commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as off-topic.

This comment was marked as off-topic.

xizheyin commented Jun 7, 2025

Uh oh!

jdonszelmann commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jdonszelmann Jun 7, 2025

Choose a reason for hiding this comment

Uh oh!

jdonszelmann commented Jun 7, 2025

Uh oh!

xizheyin commented Jun 7, 2025

Uh oh!

jdonszelmann commented Jun 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xizheyin commented Jun 7, 2025

Uh oh!

Uh oh!

Lexer: check in `advance_token` to avoid regard spare `##` as `GardedStrPrefix` #141028

Lexer: check in `advance_token` to avoid regard spare `##` as `GardedStrPrefix` #141028

xizheyin commented May 15, 2025 •

edited

Loading

xizheyin commented Jun 7, 2025 •

edited

Loading

jdonszelmann commented Jun 7, 2025 •

edited

Loading

jdonszelmann commented Jun 7, 2025 •

edited

Loading