Recommend proto3

There seems to have been a misunderstanding in the past around proto2 vs proto3. My attempt here is to clear up the confusion, recommend proto3 in general, and explain why proto3 should be preferred.

Our main confusion is about field presence. That is, if a field is omitted from the serialized wire format does the user of the decoded message know the difference between if the field was unset or set as the default value. This document has a lot of good information and is worth the read: https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md

## Origins of the confusion

Proto2 would always serialize an explicitly set field, even if it was set to the default. This meant that you could know on the decoding side whether the field was set or not. This is called _Explicit Presence_. For example, in the Rust protobuf compiler, it would wrap these in `Option<T>`: https://github.com/tokio-rs/prost#field-modifiers. 

The confusing thing is that the language guide for [proto2](https://developers.google.com/protocol-buffers/docs/proto#optional) states:
> A well-formed message may or may not contain an optional element. When a message is parsed, if it does not contain an optional element, accessing the corresponding field in the parsed object returns the default value for that field.

The subtlety here is that this doesn't say anything about "hasField" accessors. Which may be provided by the implementation to check if the field was set or not. This is essentially with prost is doing with `Option<T>` types.

Another confusing thing is that this language guide doesn't mention "presence" a single time. Which is what we're talking about here.

In proto3, if a field was set to its default value it would not be serialized. This meant that the decoding sided wouldn't know if the field was omitted because it was unset or because it was the default value. This is called _No Presence_.

## Field Presence Proto2 vs Proto3
To clarify field presence in proto2 vs proto3:

From https://github.com/protocolbuffers/protobuf/blob/main/docs/field_presence.md#presence-in-proto2-apis

### Proto2

Field type                                   | Explicit Presence
-------------------------------------------- | -----------------
Singular numeric (integer or floating point) | ✔️
Singular enum                                | ✔️
Singular string or bytes                     | ✔️
Singular message                             | ✔️
Repeated                                     |
Oneofs                                       | ✔️
Maps                                         |

### Proto3

Field type                                   | `optional` | Explicit Presence
-------------------------------------------- | ---------- | -----------------
Singular numeric (integer or floating point) | No         |
Singular enum                                | No         |
Singular string or bytes                     | No         |
Singular numeric (integer or floating point) | Yes        | ✔️
Singular enum                                | Yes        | ✔️
Singular string or bytes                     | Yes        | ✔️
Singular message                             | Yes        | ✔️
Singular message                             | No         | ✔️
Repeated                                     | N/A        |
Oneofs                                       | N/A        | ✔️
Maps                                         | N/A        |


## Advantages in Proto3 compared to Proto2

* No `required` modifier
  * This is generally considered an [anti-pattern](https://developers.google.com/protocol-buffers/docs/proto#specifying-rules) since all future versions of this message will need to contain this field. Generally users should prefer custom validation.
* Opt-in explicit presence
  * It's good to be able to get the space advantages of no-presence while still being able to opt-in to explicit presence. If we pass in an empty byte array most of the time this is semantically the same as passing no byte array, so it's nice to avoid paying the byte-cost for this.
  * But if we do want explicit presence we can opt in to it. This is useful in case we do semantically care about knowing if there was _nothing_ set.
* Simple feature set: "The reason for removing these features is to make API designs simpler, more stable, and more performant. " from https://cloud.google.com/apis/design/proto3
* Better ecosystem support. As libraries develop, it's likely they will support the latest protobuf spec rather than continue supporting proto2. This is already the case with [protons](https://github.com/ipfs/protons), the compiler that JS-IP uses (see this [bug](https://github.com/ipfs/protons/issues/34)).
* User-defined default value for fields is no longer available. This was somewhat tricky to get right.

## Next steps

1. Come to mutual understanding around proto2 vs proto3.
1. Come to consensus around recommending proto3.
1. Make the change to `README.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Recommend proto3 #465

Origins of the confusion

Field Presence Proto2 vs Proto3

Proto2

Proto3

Advantages in Proto3 compared to Proto2

Next steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field type	Explicit Presence
Singular numeric (integer or floating point)	✔️
Singular enum	✔️
Singular string or bytes	✔️
Singular message	✔️
Repeated
Oneofs	✔️
Maps

Recommend proto3 #465

Description

Origins of the confusion

Field Presence Proto2 vs Proto3

Proto2

Proto3

Advantages in Proto3 compared to Proto2

Next steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions