Skip to content

Enum Layout Guarantees #177

Closed
Closed
@jswrenn

Description

@jswrenn
Member

The motivation of this issue is two-fold:

What follows reflects my understanding of what rust already guarantees, based on (1) the reference (which is sparse, sometimes contradictory, and sometimes slightly wrong) and (2) the current observable behavior of rust in safe code.

I'm hoping for official confirmation that these guarantees are real.

Terminology

C-like Enumerations

A C-like enumeration is one in which every variant is unit-like; for example:

enum CLike {
    VariantA,
    VariantB,
    VariantC,
}

The only enumerations that may specify explicit discriminants without specifying a non-default repr are C-like enums.

Fieldless Enumerations

All C-like enumerations are field-less, because they cannot include tuple-like or struct-like variants. However, not all field-less enums are are C-like. For instance:

enum Fieldless {
    Unit,
    Tuple(),
    Struct{},
}

The only enumerations for which instances may be as-casted to their discriminant values are field-less enums.[as-casting]

Representation

The memory layout of an enumeration is well-specified iff[well-specified]:

  • the enumeration is field-less
  • the enumeration has a primitive repr

For such enumerations, the leading byte(s) reflect the discriminant value.[leading-bytes] The bytes thereafter are either padding, or correspond to the layout of the variant's fields.

Discriminant Representation

For such enumerations:

Discriminant Size

Default Representation

Under the default representation, discriminants are interpreted logically as isize. Accordingly, this is invalid:

enum Enum {
    Variant = 0i128,
    // ERROR  ^^^^^ expected isize, found i128
}

At most size_of::<isize>() bytes will be used to encode the discriminant value. If all discriminant values logically 'fit' into a smaller numeric type T, the compiler may use that smaller type T in the actual layout to encode the discriminant. Exactly size_of::<T>() bytes will be used to encode the discriminant value.[disr-size]

Primitive Representation

Under a primitive representation T, exactly size_of::<T>() bytes will be used to encode the discriminant value.[disr-size]

C Representation

Under a C representation, exactly size_of::<isize>() bytes will be used to encode the discriminant value.[disr-size]

Discriminant Value

For such an enumeration, the discriminant value of a variant is[disr-value]:

  • the explicit given value provided in the variant declaration (if any)
  • one greater than the discriminant value of the preceding variant in the enum declaration
  • zero, if the variant is the first one in the declaration and no explicit discriminant value is provided

Regardless of repr, the logical value of the discriminant will always match the actual in-memory encoding of the discriminant.[leading-bytes] That is, rust won't secretly use the byte corresponding to 3 if the logical discriminant value is 2.

(However, to reiterate: under a default representation, it may be encoded using a smaller numeric type, if the value fits.)


[as-casting]

The reference implies that it is only C-like enums that may be as-casted. This is misleading: all field-less enums may be as-casted, e.g.:

enum Fieldless {
    Unit,
    Tuple(),
    Struct{},
}

assert_eq!(Fieldless::Struct{} as isize, 2);
[well-specified]

These two conditions aren't explicitly stated anywhere in this form. The reference and safe code guarantee (or imply) the layout rules in the following sub-sections, and these just happen to be the two conditions for which those rules apparently apply.

[leading-bytes]

It doesn't seem to be stated anywhere that the leading bytes correspond to the discriminant, or that the byte encoding of the discriminant matches its logical value.

[disr-size]

Per the reference:

Under the default representation, the specified discriminant is interpreted as an isize value although the compiler is allowed to use a smaller type in the actual memory layout. The size and thus acceptable values can be changed by using a primitive representation or the C representation.

[disr-value]

The algorithm with which discriminant values are assigned is documented in the reference., and observable via as-casting in the playground.

I'm extrapolating that these rules should also apply to enums that are not fieldless, since I would be surprised (for enums under a primitive repr) if the presence fields in a variant impacted its discriminant value. (Currently, it doesn't, but there's no safe way to extract the discriminant value of such enums, so this behavior is arguably unspecified at the moment.)

Activity

gnzlbg

gnzlbg commented on Jul 25, 2019

@gnzlbg
Contributor

and (2) the current observable behavior of rust in safe code.

The current observable behavior of rust in safe code can be a result of unspecified behavior. One cannot argue that, because unspecified behavior behaves in some particular way, then it is specified to always behave in that way.

The only thing that Rust guarantees is what the reference, the API docs, or the merged RFCs or FCPs say it guarantees. If you can't find it there, then it is not guaranteed.

jswrenn

jswrenn commented on Jul 25, 2019

@jswrenn
MemberAuthor

The current observable behavior of rust in safe code can be a result of unspecified behavior.

Huh. I'm really surprised by this. Notwithstanding soundness issues and bugs (which are exempt from stability guarantees) and internal API details (like the capacity of a BufRead), wouldn't it generally be a violation of rust's stability guarantees to change the observable behavior of completely safe rust programs?

For instance, the reference is not clear that any field-less enum (not just C-like ones) can be as-casted. This is not a soundness bug, and its behaves consistently with as-casting C-like enums. It is nonetheless undocumented. Could rustc therefore behave differently for as-casting non-c-like fieldless enums in the future?


Plot twist: There's consensus (rust-lang/rust#46348) that the term "C-like" shouldn't be used, and that "field-less" should be used instead as a synonym, but the transition process has been slow, and the relevant sections of reference haven't been consistently updated. When this was decided, nobody noticed that the terms aren't actually synonymous. If the reference had been totally updated with this change, the behavior I describe above would have become well-defined, but the description of what enums can have explicit discriminants would become buggy (with respect to rust's behavior) instead.

The reference is a (self-described) best-effort document and has many contradictions (for instance, some parts specify details about layouts under the default repr, other parts say the layout under default repr is totally unspecified), and inconsistencies with rust's actual behavior (as above). I don't know how to fix these issues without looking at how rust actually behaves.

Lokathor

Lokathor commented on Jul 25, 2019

@Lokathor
Contributor

Yeah, given the stability guarantee, and the fact that all the documents you listed aren't even perfectly aligned all the time, the Stable compiler has to be considered a source of truth on the list.

RalfJung

RalfJung commented on Jul 25, 2019

@RalfJung
Member

wouldn't it generally be a violation of rust's stability guarantees to change the observable behavior of completely safe rust programs?

No. For example, safe Rust can observe the order in which the fields of a struct are laid out. And yet this order is not covered by the stability guarantee (and it has changed in the past).

jswrenn

jswrenn commented on Jul 25, 2019

@jswrenn
MemberAuthor

Gotcha. Likewise, the size_of default representation structures and enums is explicitly free to change, and that can be observed without unsafe code.

I'm convinced that my assertion about stability guarantees was wrong, though I'm not sure how to re-phrase it. What exactly is rust's stability guarantee? When can the behavior of the stable compiler be used to inform modifications to the reference?

Something like size_of producing a different value between compiler versions feels morally (?) different than, say, changing the behavior of as-casting field-less enums. The former is specified to be undefined, the latter is just unspecified. For the purposes of improving the reference by examining the stable compiler, is that an important distinction?

jswrenn

jswrenn commented on Jul 25, 2019

@jswrenn
MemberAuthor

For starters, this statement in my original post is actually incorrect (or misleading):

At most size_of::<isize>() bytes will be used to encode the discriminant value. If all discriminant values logically 'fit' into a smaller numeric type T, the compiler may use that smaller type T in the actual layout to encode the discriminant. [...] Regardless of repr, the logical value of the discriminant will always match the actual in-memory encoding of the discriminant. That is, rust won't secretly use the byte corresponding to 3 if the logical discriminant value is 2.

In fact:

At most size_of::<isize>() bytes will be used to encode the discriminant value. If all discriminant values logically 'fit' into a smaller numeric type T, the compiler may use that smaller type T in the actual layout to encode the discriminant. [...] Regardless of repr, the logical value of the discriminant will always match the actual in-memory encoding of the discriminant. That is, rust won't secretly use the byte corresponding to 3 if the logical discriminant value is 2.

The discriminant may be completely absent from the layout:

enum Enum {
  Variant = 42isize,
}

// the logical discriminant is 42isize
assert_eq!(42, Enum::Variant as isize);

// but no bytes at all are used to encode it
assert_eq!(0, core::mem::size_of::<Enum>());
gnzlbg

gnzlbg commented on Jul 26, 2019

@gnzlbg
Contributor

What exactly is rust's stability guarantee?

That we won't break your code if it wasn't broken before, that safe Rust does not have UB.

If you have a safe Rust program like this:

struct SomeReprRustType { ... }
fn main() {
    assert_eq!(size_of::<SomeReprRustType>(), 42);
}

If the assert passes, and there is no guarantee written down anywhere that the assert will pass, the semantics of this program depend on unspecified behavior. We can change that behavior at will, and then your program will panic!, but that won't introduce undefined behavior, so that's ok.

When can the behavior of the stable compiler be used to inform modifications to the reference?

Pretty much never I think. If you want to guarantee something that isn't guaranteed anywhere, you have to write an RFC proposing the guarantee, push it through the process, and the language team needs to accept that this is a guarantee that we want to provide.

If this was already approved somewhere, but the reference has a bug, then you can modify the reference, but the argument for the modification is not "stable Rust currently does X", but rather "this is already guaranteed somewhere else".

For the purposes of improving the reference by examining the stable compiler, is that an important distinction?

When some behavior is unspecified, you can just document it as unspecified in the reference. When we do this, we do sometimes add a non-normative note explaining what the compiler does at the time of that writing, being clear that this is not guaranteed, and can change any time. Recognizing that sometimes is unspecified, and documenting that, is often the first step towards getting it specified.

jswrenn

jswrenn commented on Jul 26, 2019

@jswrenn
MemberAuthor

I really appreciate all this clarification!

I'd like to get the arbitrary_enum_discriminants feature at a point where it's usable without relying on unspecified behavior. This is going to require* making at least one guarantee for enums with primitive reprs that has not already been stated explicitly (but was implicitly assumed to be true in the RFC):

  • the leading size_of::<repr>() byte(s) correspond exactly to the logical discriminant value

What are my next steps? Does this require an RFC?

gnzlbg

gnzlbg commented on Jul 26, 2019

@gnzlbg
Contributor

Why are you relying on undefined behavior ?

If you need guarantees about where the discriminant is, why can't you use #[repr(C, Int)] instead of just #[repr(Int)] ?

jswrenn

jswrenn commented on Jul 26, 2019

@jswrenn
MemberAuthor

Why are you relying on undefined behavior?

The motivating example in the arbitrary_enum_discriminant RFC relies on a pointer cast (see AnimationValue::id) to extract the discriminant value. This cast assumes that the leading size_of::<repr>() byte(s) correspond exactly to the logical discriminant value.


If you need guarantees about where the discriminant is, why can't you use #[repr(C, Int)] instead of just #[repr(Int)] ?

Per the reference, this renders the layout totally unspecified:

Likewise, combining the C representation with a primitive representation, the layout is unspecified.

gnzlbg

gnzlbg commented on Jul 26, 2019

@gnzlbg
Contributor

@jswrenn have you seen: https://rust-lang.github.io/rfcs/2195-really-tagged-unions.html#guide-level-explanation ? IIUC the first guarantee requires the tag of a #[repr(Int)] enum to be at the beginning of the representation.

jswrenn

jswrenn commented on Jul 26, 2019

@jswrenn
MemberAuthor

I hadn't!!! This seems like exactly the sort of documentation I've been looking for!

added
C-open-questionCategory: An open question that we should revisit
A-layoutTopic: Related to data structure layout (`#[repr]`)
on Aug 14, 2019
RalfJung

RalfJung commented on Aug 14, 2019

@RalfJung
Member

So, can we close this issue then?
Seems fair to have an issue like #96 for RFC 2195, but that should likely be a new one.

gnzlbg

gnzlbg commented on Aug 14, 2019

@gnzlbg
Contributor

I think so. @jswrenn please feel free to re-open if your question wasn't properly addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-layoutTopic: Related to data structure layout (`#[repr]`)C-open-questionCategory: An open question that we should revisit

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @RalfJung@gnzlbg@jswrenn@Lokathor

        Issue actions

          Enum Layout Guarantees · Issue #177 · rust-lang/unsafe-code-guidelines