Skip to content

parser: use built-in whitespace trimming #238

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 14, 2024

Conversation

Kijewski
Copy link
Member

@Kijewski Kijewski commented Nov 13, 2024

Our is_ws() implementation was the same since the earliest days of askama: askama-rs/askama-old@4805acc#diff-a9b78e3979c46af7f12ceef379a5a7a657fbe8d77900aa6b4e88b7519b415ca3R31-R34. It considers spaces , horizontal tabs \t, carriage return \r, and end of line \n characters as white spaces characters. Rust's u8::is_ascii_whitespace() adds formfeed \f to the list.

This PR replaces is_ws() with the built-in char::is_ascii_whitespace(). The parser is a tiny bit faster, and we are a bit more standard conforming:

benchmark results
     Running benches/from_str.rs (/home/kijewski/git/rust/rinja/target/release/deps/from_str-fae58673a015eed7)
librustdoc/all          time:   [322.12 µs 323.71 µs 325.71 µs]
                        thrpt:  [43.355 MiB/s 43.623 MiB/s 43.837 MiB/s]
                 change:
                        time:   [-6.3192% -5.9056% -5.4469%] (p = 0.00 < 0.05)
                        thrpt:  [+5.7607% +6.2763% +6.7454%]
                        Performance has improved.
librustdoc/item_info    time:   [5.6035 µs 5.6117 µs 5.6220 µs]
                        thrpt:  [27.989 MiB/s 28.041 MiB/s 28.082 MiB/s]
                 change:
                        time:   [-5.7565% -5.2861% -4.7532%] (p = 0.00 < 0.05)
                        thrpt:  [+4.9904% +5.5811% +6.1081%]
                        Performance has improved.
librustdoc/item_union   time:   [32.750 µs 32.863 µs 32.984 µs]
                        thrpt:  [29.926 MiB/s 30.036 MiB/s 30.139 MiB/s]
                 change:
                        time:   [-8.0835% -7.2746% -6.3867%] (p = 0.00 < 0.05)
                        thrpt:  [+6.8224% +7.8453% +8.7944%]
                        Performance has improved.
librustdoc/page         time:   [149.16 µs 149.45 µs 149.84 µs]
                        thrpt:  [41.327 MiB/s 41.433 MiB/s 41.515 MiB/s]
                 change:
                        time:   [-7.1007% -6.8131% -6.5353%] (p = 0.00 < 0.05)
                        thrpt:  [+6.9922% +7.3113% +7.6435%]
                        Performance has improved.
librustdoc/print_item   time:   [18.142 µs 18.197 µs 18.253 µs]
                        thrpt:  [51.726 MiB/s 51.883 MiB/s 52.040 MiB/s]
                 change:
                        time:   [-5.4557% -5.0690% -4.6229%] (p = 0.00 < 0.05)
                        thrpt:  [+4.8470% +5.3396% +5.7706%]
                        Performance has improved.
librustdoc/short_item_info
                        time:   [17.533 µs 17.552 µs 17.576 µs]
                        thrpt:  [51.547 MiB/s 51.618 MiB/s 51.674 MiB/s]
                 change:
                        time:   [-6.0722% -5.6344% -5.2641%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5566% +5.9708% +6.4647%]
                        Performance has improved.
librustdoc/sidebar      time:   [35.947 µs 36.056 µs 36.176 µs]
                        thrpt:  [34.113 MiB/s 34.226 MiB/s 34.330 MiB/s]
                 change:
                        time:   [-7.6516% -7.0295% -6.4676%] (p = 0.00 < 0.05)
                        thrpt:  [+6.9148% +7.5611% +8.2856%]
                        Performance has improved.
librustdoc/source       time:   [13.274 µs 13.318 µs 13.364 µs]
                        thrpt:  [55.161 MiB/s 55.353 MiB/s 55.537 MiB/s]
                 change:
                        time:   [-6.2263% -5.9307% -5.6552%] (p = 0.00 < 0.05)
                        thrpt:  [+5.9941% +6.3047% +6.6397%]
                        Performance has improved.
librustdoc/type_layout_size
                        time:   [8.1719 µs 8.2034 µs 8.2364 µs]
                        thrpt:  [32.884 MiB/s 33.016 MiB/s 33.143 MiB/s]
                 change:
                        time:   [-5.6332% -5.3531% -5.0853%] (p = 0.00 < 0.05)
                        thrpt:  [+5.3578% +5.6559% +5.9695%]
                        Performance has improved.
librustdoc/type_layout  time:   [29.034 µs 29.101 µs 29.175 µs]
                        thrpt:  [92.278 MiB/s 92.513 MiB/s 92.726 MiB/s]
                 change:
                        time:   [-8.3192% -7.9370% -7.4312%] (p = 0.00 < 0.05)
                        thrpt:  [+8.0278% +8.6212% +9.0741%]
                        Performance has improved.

I doubt that anyone is relying on the fact that the invisible character \f won't be stripped by whitespace control. If you absolutely need this character to be preserved, then you can always use {{ '\u{b}' }}.

Right now, trim_ascii_start() is manually implemented, because it is only stable since rust 1.80.

@@ -274,18 +274,51 @@
}
}

fn is_ws(c: char) -> bool {
matches!(c, ' ' | '\t' | '\r' | '\n')
/// FIXME: replace with `s.trim_ascii_start()` when we raise our MSRV to 1.80

Check notice

Code scanning / devskim

A "TODO" or similar was left in source code, possibly indicating incomplete functionality Note

Suspicious comment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not raise our MSRV to 1.80 directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, that's a good call. We already have merged breaking changes anyway, and 1.80 brings a ton of great additions, which might come in handy when we implement or optimize things further down the line.

Our `is_ws()` implementation was the same since the earliest days of askama:
<askama-rs/askama-old@4805acc#diff-a9b78e3979c46af7f12ceef379a5a7a657fbe8d77900aa6b4e88b7519b415ca3R31-R34>.
It considers spaces ` `, horizontal tabs `\t`, carriage return `\r`, and
end of line `\n` characters as white spaces characters. Rust's
`u8::is_ascii_whitespace()` adds formfeed `\f` to the list.

This PR replaces `is_ws()` with the built-in
`char::is_ascii_whitespace()`. The parser is a tiny bit faster, and we
are a bit more standard conforming:

<details>
<summary>benchmark results</summary>

```text
     Running benches/from_str.rs (/home/kijewski/git/rust/rinja/target/release/deps/from_str-fae58673a015eed7)
librustdoc/all          time:   [322.12 µs 323.71 µs 325.71 µs]
                        thrpt:  [43.355 MiB/s 43.623 MiB/s 43.837 MiB/s]
                 change:
                        time:   [-6.3192% -5.9056% -5.4469%] (p = 0.00 < 0.05)
                        thrpt:  [+5.7607% +6.2763% +6.7454%]
                        Performance has improved.
librustdoc/item_info    time:   [5.6035 µs 5.6117 µs 5.6220 µs]
                        thrpt:  [27.989 MiB/s 28.041 MiB/s 28.082 MiB/s]
                 change:
                        time:   [-5.7565% -5.2861% -4.7532%] (p = 0.00 < 0.05)
                        thrpt:  [+4.9904% +5.5811% +6.1081%]
                        Performance has improved.
librustdoc/item_union   time:   [32.750 µs 32.863 µs 32.984 µs]
                        thrpt:  [29.926 MiB/s 30.036 MiB/s 30.139 MiB/s]
                 change:
                        time:   [-8.0835% -7.2746% -6.3867%] (p = 0.00 < 0.05)
                        thrpt:  [+6.8224% +7.8453% +8.7944%]
                        Performance has improved.
librustdoc/page         time:   [149.16 µs 149.45 µs 149.84 µs]
                        thrpt:  [41.327 MiB/s 41.433 MiB/s 41.515 MiB/s]
                 change:
                        time:   [-7.1007% -6.8131% -6.5353%] (p = 0.00 < 0.05)
                        thrpt:  [+6.9922% +7.3113% +7.6435%]
                        Performance has improved.
librustdoc/print_item   time:   [18.142 µs 18.197 µs 18.253 µs]
                        thrpt:  [51.726 MiB/s 51.883 MiB/s 52.040 MiB/s]
                 change:
                        time:   [-5.4557% -5.0690% -4.6229%] (p = 0.00 < 0.05)
                        thrpt:  [+4.8470% +5.3396% +5.7706%]
                        Performance has improved.
librustdoc/short_item_info
                        time:   [17.533 µs 17.552 µs 17.576 µs]
                        thrpt:  [51.547 MiB/s 51.618 MiB/s 51.674 MiB/s]
                 change:
                        time:   [-6.0722% -5.6344% -5.2641%] (p = 0.00 < 0.05)
                        thrpt:  [+5.5566% +5.9708% +6.4647%]
                        Performance has improved.
librustdoc/sidebar      time:   [35.947 µs 36.056 µs 36.176 µs]
                        thrpt:  [34.113 MiB/s 34.226 MiB/s 34.330 MiB/s]
                 change:
                        time:   [-7.6516% -7.0295% -6.4676%] (p = 0.00 < 0.05)
                        thrpt:  [+6.9148% +7.5611% +8.2856%]
                        Performance has improved.
librustdoc/source       time:   [13.274 µs 13.318 µs 13.364 µs]
                        thrpt:  [55.161 MiB/s 55.353 MiB/s 55.537 MiB/s]
                 change:
                        time:   [-6.2263% -5.9307% -5.6552%] (p = 0.00 < 0.05)
                        thrpt:  [+5.9941% +6.3047% +6.6397%]
                        Performance has improved.
librustdoc/type_layout_size
                        time:   [8.1719 µs 8.2034 µs 8.2364 µs]
                        thrpt:  [32.884 MiB/s 33.016 MiB/s 33.143 MiB/s]
                 change:
                        time:   [-5.6332% -5.3531% -5.0853%] (p = 0.00 < 0.05)
                        thrpt:  [+5.3578% +5.6559% +5.9695%]
                        Performance has improved.
librustdoc/type_layout  time:   [29.034 µs 29.101 µs 29.175 µs]
                        thrpt:  [92.278 MiB/s 92.513 MiB/s 92.726 MiB/s]
                 change:
                        time:   [-8.3192% -7.9370% -7.4312%] (p = 0.00 < 0.05)
                        thrpt:  [+8.0278% +8.6212% +9.0741%]
                        Performance has improved.
```
</details>

I doubt that anyone is relying on the fact that the invisible character
`\f` won't be stripped by whitespace control. If you absolutely need
this character to be preserved, then you can always use `{{ '\u{b}' }}`.

Right now, `trim_ascii_start()` is manually implemented, because it is
only stable since rust 1.80.
@Kijewski Kijewski force-pushed the pr-is_ascii_whitespace branch from 404ad76 to d3b9b6e Compare November 13, 2024 22:33
@GuillaumeGomez
Copy link
Collaborator

Thanks!

@GuillaumeGomez GuillaumeGomez merged commit 8dd3fc9 into askama-rs:master Nov 14, 2024
36 checks passed
@Kijewski Kijewski deleted the pr-is_ascii_whitespace branch November 14, 2024 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants