Description
Summary
I noticed when clamping and casting from i32
to u8
, using clamp(0, 255) as u8
produces unnecessary instructions compared to .max(0).min(255) as u8
. If a loop is auto-vectorized, the branches in clamp
result in slower code than manual clamping.
I couldn't find a label for this, but it would be akin to I-suggestion-causes-perf-regression
.
Currently, the lint is set to warn
but following the suggestion inhibits optimization. I don't believe it should fire on the "branchless" patterns which are semantically different.
// 1
input.max(min).min(max)
// 2
let mut x = input;
if x < min { x = min; }
if x > max { x = max; }
Lint Name
manual_clamp
Lint Description
I also had a small issue with the wording in the current description.
Why is this bad?
clamp is much shorter, easier to read, and doesn’t use any control flow.
https://rust-lang.github.io/rust-clippy/master/index.html#/manual_clamp
I slightly disagree with the reasoning here.
I understand the user doesn't have to add any control flow, but the control flow within the clamp implementation is different enough to affect performance in some cases. It is not strictly a "better" clamping method than manually clamping, especially for primitive integers.
Reproducer
#[inline(never)]
pub fn clamp(input: &[i32], output: &mut [u8]) {
for (&i, o) in input.iter().zip(output.iter_mut()) {
*o = i.clamp(0, 255) as u8;
}
}
#[inline(never)]
pub fn manual_clamp(input: &[i32], output: &mut [u8]) {
for (&i, o) in input.iter().zip(output.iter_mut()) {
*o = i.max(0).min(255) as u8;
}
}
Assembly output - https://rust.godbolt.org/z/rdoh97d3v (1.78, but same output on nightly)
The main difference is in the label .LBB0_4
where extra work is being done by the clamp code.
Version
rustc 1.80.0-nightly (d84b90375 2024-05-19)
binary: rustc
commit-hash: d84b9037541f45dc2c52a41d723265af211c0497
commit-date: 2024-05-19
host: x86_64-pc-windows-msvc
release: 1.80.0-nightly
LLVM version: 18.1.4
Additional Labels
No response