Description
Moved from #85263 as the PR has been merged.
Summary of safety of pattern matching against Rust union
s
Written by @Smittyvb, copied from #85263 (comment)
The unsafety checker is being to rewritten operate on the THIR instead of the MIR. As a part of that, I was implementing the unsafety rules for union
s, and encountered some weird edge cases with the way the MIR unsafety checker handles union
s. In general, writing to a union is safe but reading is unsafe
. There are some cases where writing to a union is unsafe (such as when that might cause a Drop
call) that are not being implemented in this PR.
Behavior specified by RFC 1444
Unsafe code may pattern match on union fields, using the same syntax as a
struct, without the requirement to mention every field of the union in a match
or use..
:fn f(u: MyUnion) { unsafe { match u { MyUnion { f1: 10 } => { println!("ten"); } MyUnion { f2 } => { println!("{}", f2); } } } }Matching a specific value from a union field makes a refutable pattern; naming
a union field without matching a specific value makes an irrefutable pattern.
Both require unsafe code.Pattern matching may match a union as a field of a larger structure. In
particular, when using a Rust union to implement a C tagged union via FFI, this
allows matching on the tag and the corresponding field simultaneously:#[repr(u32)] enum Tag { I, F } #[repr(C)] union U { i: i32, f: f32, } #[repr(C)] struct Value { tag: Tag, u: U, } fn is_zero(v: Value) -> bool { unsafe { match v { Value { tag: I, u: U { i: 0 } } => true, Value { tag: F, u: U { f: 0.0 } } => true, _ => false, } } }Note that a pattern match on a union field that has a smaller size than the
entire union must not make any assumptions about the value of the union's
memory outside that field. For example, if a union contains au8
and a
u32
, matching on theu8
may not perform au32
-sized comparison over the
entire union.
Actual behavior
The MIR unsafety checker doesn't implement that behavior exactly. Due to the way that the MIR is constructed and optimized, some destructuring patterns against unions that that the RFC specifies to be unsafe are allowed.
The behavior of the MIR is mostly "irrefutable pattern matching against unions without any bindings after desugaring or-patterns is safe", with some extra weird behavior when niches are involved.
Here are some examples of what the MIR considers safe and unsafe. Here is the prelude for all of these examples:
union Foo { bar: i8, zst: (), pizza: Pizza, oneval: OneVal, twoval: TwoVal, khar: char }
#[derive(Copy, Clone)]
struct Pizza { topping: Option<PizzaTopping> }
#[derive(Copy, Clone)]
enum PizzaTopping { Cheese, Pineapple }
#[derive(Copy, Clone)]
#[repr(u8)]
enum OneVal { One = 1 }
#[derive(Copy, Clone)]
#[repr(u8)]
pub enum TwoVal {
One = 1,
Two = 2,
}
let mut foo = Foo { bar: 5 };
Patterns considered safe
match (Foo { bar: 42 }) {
Foo { oneval: OneVal::One } => {
// always run
},
}
match foo {
Foo { bar: _ | _ } => {},
}
match u {
Foo { pizza: Pizza { .. } } => {},
};
match u {
Foo { pizza: Pizza { topping: _ } } => {},
};
match foo {
Foo { zst: () } => {},
}
let Foo { bar: _ } = foo;
match Some(foo) {
Some(Foo { bar: _ }) => 3,
None => 4,
};
Patterns considered unsafe
All of these require an unsafe
block to compile.
match (Foo { bar: 42 }) {
Foo { twoval: TwoVal::One | TwoVal::Two } => {
// always run
},
}
match foo {
Foo {
pizza: Pizza {
topping: Some(PizzaTopping::Cheese) | Some(PizzaTopping::Pineapple) | None
}
} => {},
}
match foo {
Foo { bar: _a } => {},
}
let Foo { bar: inner } = foo;
let (Foo { bar } | Foo { bar }) = foo;
match foo.khar {
'\0'..='\u{D7FF}' | '\u{E000}'..='\u{10FFFF}' => ()
};
match x.b {
'\0'..='\u{10FFFF}' => 1,
};
Activity
petrochenkov commentedon Jul 27, 2021
In the initial HIR-based implementation any field read was considered unsafe as described in the RFC, whether it's done through a dot access or pattern destructuring.
So the regression happened together with migration to the MIR unsafety checker.
petrochenkov commentedon Jul 27, 2021
The
Foo { oneval: OneVal::One }
example is strikingly similar to UB-based optimizations a la "we know that overflow is impossible, so we'll delete your code".Here by accessing
oneval
we assert that it's indeed containing a valid value (and this assertion is used by the compiler to generate// always run
), this is an unsafe action so the compiler shouldn't perform it on our behalf.RalfJung commentedon Jul 28, 2021
OneVal
has size 0. So comparing something againstOneVal::One
involves loading 0 bytes and comparing them; this will always succeed.So, indeed it always contains a valid value, and that is trivially correct -- no assumption/assertion needed.
So, I don't see any parallel with UB-based optimizations here.
syvb commentedon Jul 28, 2021
@RalfJung
OneVal
has a#[repr(u8)]
, which forcesOneVal::One
to be represented in memory as a single byte with a value of 1. Transmuting any other value to aOneVal
would be UB. Without that attribute it would be a ZST.RalfJung commentedon Jul 28, 2021
Oh, I missed the
repr
. (Again...)So in a sense this is assuming that the union field satisfies the validity invariant without even looking at it... which does seem very dubious, yes.
joshtriplett commentedon Aug 3, 2021
We discussed this in today's @rust-lang/lang meeting.
Our initial reaction is that several of the examples currently considered safe should not be. (Some of them might potentially be in the future based on safe-transmute infrastructure, but they shouldn't be until we have that infrastructure. And some of them shouldn't ever be.)
It seems OK to match a ZST without an unsafe block. But matching a non-ZST implies that the code inside the match arm will only run if the value matches, and if the compiler doesn't enforce that semantic then that seems inordinately confusing and error-prone. For a non-union, it's UB to put an invalid value into an enum, but for a union field, it's perfectly valid to write a value to one field that would be invalid for the type of another, as long as you never read a field that doesn't have a valid value for its type.
I've started a Zulip thread to discuss the desired semantics further.
-Zthir-unsafeck
perf #87710nikomatsakis commentedon Oct 19, 2021
Removing nomination as we are not doing much here at the moment. We do need to return to this topic!
unsafe
#110974pnkfelix commentedon Oct 30, 2024
cc rust-lang/unsafe-code-guidelines#540
cc rust-lang/reference#1665