-
Notifications
You must be signed in to change notification settings - Fork 28
parser: use built-in whitespace trimming #238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser: use built-in whitespace trimming #238
Conversation
rinja_parser/src/lib.rs
Fixed
@@ -274,18 +274,51 @@ | |||
} | |||
} | |||
|
|||
fn is_ws(c: char) -> bool { | |||
matches!(c, ' ' | '\t' | '\r' | '\n') | |||
/// FIXME: replace with `s.trim_ascii_start()` when we raise our MSRV to 1.80 |
Check notice
Code scanning / devskim
A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not raise our MSRV to 1.80 directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, that's a good call. We already have merged breaking changes anyway, and 1.80 brings a ton of great additions, which might come in handy when we implement or optimize things further down the line.
Our `is_ws()` implementation was the same since the earliest days of askama: <askama-rs/askama-old@4805acc#diff-a9b78e3979c46af7f12ceef379a5a7a657fbe8d77900aa6b4e88b7519b415ca3R31-R34>. It considers spaces ` `, horizontal tabs `\t`, carriage return `\r`, and end of line `\n` characters as white spaces characters. Rust's `u8::is_ascii_whitespace()` adds formfeed `\f` to the list. This PR replaces `is_ws()` with the built-in `char::is_ascii_whitespace()`. The parser is a tiny bit faster, and we are a bit more standard conforming: <details> <summary>benchmark results</summary> ```text Running benches/from_str.rs (/home/kijewski/git/rust/rinja/target/release/deps/from_str-fae58673a015eed7) librustdoc/all time: [322.12 µs 323.71 µs 325.71 µs] thrpt: [43.355 MiB/s 43.623 MiB/s 43.837 MiB/s] change: time: [-6.3192% -5.9056% -5.4469%] (p = 0.00 < 0.05) thrpt: [+5.7607% +6.2763% +6.7454%] Performance has improved. librustdoc/item_info time: [5.6035 µs 5.6117 µs 5.6220 µs] thrpt: [27.989 MiB/s 28.041 MiB/s 28.082 MiB/s] change: time: [-5.7565% -5.2861% -4.7532%] (p = 0.00 < 0.05) thrpt: [+4.9904% +5.5811% +6.1081%] Performance has improved. librustdoc/item_union time: [32.750 µs 32.863 µs 32.984 µs] thrpt: [29.926 MiB/s 30.036 MiB/s 30.139 MiB/s] change: time: [-8.0835% -7.2746% -6.3867%] (p = 0.00 < 0.05) thrpt: [+6.8224% +7.8453% +8.7944%] Performance has improved. librustdoc/page time: [149.16 µs 149.45 µs 149.84 µs] thrpt: [41.327 MiB/s 41.433 MiB/s 41.515 MiB/s] change: time: [-7.1007% -6.8131% -6.5353%] (p = 0.00 < 0.05) thrpt: [+6.9922% +7.3113% +7.6435%] Performance has improved. librustdoc/print_item time: [18.142 µs 18.197 µs 18.253 µs] thrpt: [51.726 MiB/s 51.883 MiB/s 52.040 MiB/s] change: time: [-5.4557% -5.0690% -4.6229%] (p = 0.00 < 0.05) thrpt: [+4.8470% +5.3396% +5.7706%] Performance has improved. librustdoc/short_item_info time: [17.533 µs 17.552 µs 17.576 µs] thrpt: [51.547 MiB/s 51.618 MiB/s 51.674 MiB/s] change: time: [-6.0722% -5.6344% -5.2641%] (p = 0.00 < 0.05) thrpt: [+5.5566% +5.9708% +6.4647%] Performance has improved. librustdoc/sidebar time: [35.947 µs 36.056 µs 36.176 µs] thrpt: [34.113 MiB/s 34.226 MiB/s 34.330 MiB/s] change: time: [-7.6516% -7.0295% -6.4676%] (p = 0.00 < 0.05) thrpt: [+6.9148% +7.5611% +8.2856%] Performance has improved. librustdoc/source time: [13.274 µs 13.318 µs 13.364 µs] thrpt: [55.161 MiB/s 55.353 MiB/s 55.537 MiB/s] change: time: [-6.2263% -5.9307% -5.6552%] (p = 0.00 < 0.05) thrpt: [+5.9941% +6.3047% +6.6397%] Performance has improved. librustdoc/type_layout_size time: [8.1719 µs 8.2034 µs 8.2364 µs] thrpt: [32.884 MiB/s 33.016 MiB/s 33.143 MiB/s] change: time: [-5.6332% -5.3531% -5.0853%] (p = 0.00 < 0.05) thrpt: [+5.3578% +5.6559% +5.9695%] Performance has improved. librustdoc/type_layout time: [29.034 µs 29.101 µs 29.175 µs] thrpt: [92.278 MiB/s 92.513 MiB/s 92.726 MiB/s] change: time: [-8.3192% -7.9370% -7.4312%] (p = 0.00 < 0.05) thrpt: [+8.0278% +8.6212% +9.0741%] Performance has improved. ``` </details> I doubt that anyone is relying on the fact that the invisible character `\f` won't be stripped by whitespace control. If you absolutely need this character to be preserved, then you can always use `{{ '\u{b}' }}`. Right now, `trim_ascii_start()` is manually implemented, because it is only stable since rust 1.80.
404ad76
to
d3b9b6e
Compare
Thanks! |
Our
, horizontal tabs
is_ws()
implementation was the same since the earliest days of askama: askama-rs/askama-old@4805acc#diff-a9b78e3979c46af7f12ceef379a5a7a657fbe8d77900aa6b4e88b7519b415ca3R31-R34. It considers spaces\t
, carriage return\r
, and end of line\n
characters as white spaces characters. Rust'su8::is_ascii_whitespace()
adds formfeed\f
to the list.This PR replaces
is_ws()
with the built-inchar::is_ascii_whitespace()
. The parser is a tiny bit faster, and we are a bit more standard conforming:benchmark results
I doubt that anyone is relying on the fact that the invisible character
\f
won't be stripped by whitespace control. If you absolutely need this character to be preserved, then you can always use{{ '\u{b}' }}
.Right now,trim_ascii_start()
is manually implemented, because it is only stable since rust 1.80.