Skip to content

Recommend proto3 #465

Closed
Closed
@MarcoPolo

Description

@MarcoPolo

There seems to have been a misunderstanding in the past around proto2 vs proto3. My attempt here is to clear up the confusion, recommend proto3 in general, and explain why proto3 should be preferred.

Our main confusion is about field presence. That is, if a field is omitted from the serialized wire format does the user of the decoded message know the difference between if the field was unset or set as the default value. This document has a lot of good information and is worth the read: https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md

Origins of the confusion

Proto2 would always serialize an explicitly set field, even if it was set to the default. This meant that you could know on the decoding side whether the field was set or not. This is called Explicit Presence. For example, in the Rust protobuf compiler, it would wrap these in Option<T>: https://github.com/tokio-rs/prost#field-modifiers.

The confusing thing is that the language guide for proto2 states:

A well-formed message may or may not contain an optional element. When a message is parsed, if it does not contain an optional element, accessing the corresponding field in the parsed object returns the default value for that field.

The subtlety here is that this doesn't say anything about "hasField" accessors. Which may be provided by the implementation to check if the field was set or not. This is essentially with prost is doing with Option<T> types.

Another confusing thing is that this language guide doesn't mention "presence" a single time. Which is what we're talking about here.

In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called No Presence.

Field Presence Proto2 vs Proto3

To clarify field presence in proto2 vs proto3:

From https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-proto2-apis

Proto2

Field type Explicit Presence
Singular numeric (integer or floating point) ✔️
Singular enum ✔️
Singular string or bytes ✔️
Singular message ✔️
Repeated
Oneofs ✔️
Maps

Proto3

Field type optional Explicit Presence
Singular numeric (integer or floating point) No
Singular enum No
Singular string or bytes No
Singular numeric (integer or floating point) Yes ✔️
Singular enum Yes ✔️
Singular string or bytes Yes ✔️
Singular message Yes ✔️
Singular message No ✔️
Repeated N/A
Oneofs N/A ✔️
Maps N/A

Advantages in Proto3 compared to Proto2

  • No required modifier
    • This is generally considered an anti-pattern since all future versions of this message will need to contain this field. Generally users should prefer custom validation.
  • Opt-in explicit presence
    • It's good to be able to get the space advantages of no-presence while still being able to opt-in to explicit presence. If we pass in an empty byte array most of the time this is semantically the same as passing no byte array, so it's nice to avoid paying the byte-cost for this.
    • But if we do want explicit presence we can opt in to it. This is useful in case we do semantically care about knowing if there was nothing set.
  • Simple feature set: "The reason for removing these features is to make API designs simpler, more stable, and more performant. " from https://cloud.google.com/apis/design/proto3
  • Better ecosystem support. As libraries develop, it's likely they will support the latest protobuf spec rather than continue supporting proto2. This is already the case with protons, the compiler that JS-IP uses (see this bug).
  • User-defined default value for fields is no longer available. This was somewhat tricky to get right.

Next steps

  1. Come to mutual understanding around proto2 vs proto3.
  2. Come to consensus around recommending proto3.
  3. Make the change to README.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions