Skip to content

chore(deps)!: upgrade to hyper 1.x #3504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Mar 21, 2025
Merged

chore(deps)!: upgrade to hyper 1.x #3504

merged 32 commits into from
Mar 21, 2025

Conversation

cratelyn
Copy link
Collaborator

@cratelyn cratelyn commented Jan 6, 2025

✨ chore(deps): upgrade to hyperium 1.x crates

this branch performs an exciting upgrade for our proxy.

this branch upgrades a number of our dependencies so that we use the 1.0
release family of the hyper http framework, and its ecosystem. see the
v1.0 announcement for more information.

this branch upgrades the following dependencies:

  • h2: 0.3 -> 0.4
  • http: 0.2 -> 1
  • http-body: 0.4 -> 1
  • hyper: 0.14.32 -> 1
  • prost: 0.12 -> 0.13
  • prost-build: 0.12 -> 0.13
  • prost-types: 0.12 -> 0.13
  • tonic: 0.10 -> 0.12
  • tonic-build: 0.10 -> 0.12

a hyper-util dependency is added, which provides among other things,
legacy-compatible interfaces such as hyper_util::client::legacy::Client, or
glue to use hyper with the tokio runtime.

see https://docs.rs/hyper-util/latest/hyper_util/ for more information.

a http-body-util dependency is added, which provides a BodyExt trait and a
channel-backed body for use in unit tests. the deprecated feature flag that
was active on our 0.14 hyper dependency has been removed, along with the
stream and runtime feature flags.

the linkerd2-proxy-api dependency is updated. see:
linkerd/linkerd2-proxy-api#421

📝 notes for review

bear particular attention to changes involving http_body::Body middleware.

the change from two separate poll_data() and poll_trailers() functions,
to a single poll_frame() method, induces some subtle changes to various
pieces of middleware.

also bear in mind that failing to set a timer, in our case
hyper_util::rt::TokioTimer, can cause http/2 clients, or http/1 and http/2
servers, to panic. make sure that any uses of
hyper::server::conn::http1::Builder, hyper::client::conn::http1::Builder,
or hyper::client::conn::http2::Builder install a timer.

❗ breaking change: l5d-proxy-error values

the l5d-proxy-error header can be examined to observe the cause of proxy
errors encountered when sending meshed traffic. by virtue of this using a newer
hyper client in the proxy, some error messages may in turn look different.
for example, an error like "connect timed out after 1s" may now appear as
"client error (Connect)".

📚 other notes

this work, by virtue of touching so many parts of the system, is carried out
in distinct commits. an initial commit upgrades the dependencies at th
workspace level. subsequent commits will not compile if the --workspace flag
is provided, but the intent of this branch is to update each crate
individually.

use commands like, e.g. cargo check --tests -p linkerd-proxy-http to build
particular crates at intermediate commits within this branch.

this commit is also only the final leaf in an extended line of work. this
has been done to mitigate the effort of reviewing this change, and the risk of
churn in the event of any unanticipated errors. see the top-level comment in
linkerd/linkerd2#8733 for an overview of all of the
work that brought us to this juncture.

@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch from daf57fe to dd6f80f Compare January 7, 2025 19:04
cratelyn added a commit that referenced this pull request Jan 8, 2025
this commit makes a small, subtle change to the `PeekTrailersBody<B>`
http response body middleware.

to help facilitate upgrading to http-body 1.x, we remove some tricky logic
that involves `Body` interfaces that no longer exist after the 0.4
release.

currently, a `PeekTrailersBody<B>` is not fully consistent about the
conditions in which it will peek the trailers of a response body: the
inner body is allowed to yield _either_ (a) **zero** DATA frames, in which
case the body will be `.await`'ed and polled until the trailers are
obtained, or (b) **one** DATA frame, so long as the inner body
immediately yields a trailer.

meanwhile, the documentation comment for the type claims:

> An HTTP body that allows inspecting the body's trailers, if a
> `TRAILERS` frame was the first frame after the initial headers frame.

we won't have distinct `data()` and `trailers()` interfaces in the 1.0
release. we have a single
[`BodyExt::frame()`](https://docs.rs/http-body-util/latest/http_body_util/trait.BodyExt.html#method.frame)
method.

consequently, porting this middleware as-is would be somewhat difficult.
we might have to hold two frames, should we receive one frame,
`now_or_never()` the second frame, and discover that we've been provided
a second data frame rather than the trailers.

this all runs slightly against the invariants of `Body`, see this
comment originally added in 7f817b5:

```
// Peek to see if there's immediately a trailers frame, and grab
// it if so. Otherwise, bail.
// XXX(eliza): the documentation for the `http::Body` trait says
// that `poll_trailers` should only be called after `poll_data`
// returns `None`...but, in practice, I'm fairly sure that this just
// means that it *will not return `Ready`* until there are no data
// frames left, which is fine for us here, because we `now_or_never`
// it.
```

this isn't quite true, as `Trailers` is just a wrapper calling
`poll_trailers`:
<https://docs.rs/http-body/0.4.6/src/http_body/next.rs.html#28-30>

so, let's remove this. doing so will make the task of porting this
middleware to http-body 1.0 in the short term, and additionally prevents
any potential future misbehavior due to inner bodies not handling this
eager trailer polling gracefully.

see linkerd/linkerd2#8733.

see #3504.

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Jan 8, 2025
this commit makes a small, subtle change to the `PeekTrailersBody<B>`
http response body middleware.

to help facilitate upgrading to http-body 1.x, we remove some tricky logic
that involves `Body` interfaces that no longer exist after the 0.4
release.

currently, a `PeekTrailersBody<B>` is not fully consistent about the
conditions in which it will peek the trailers of a response body: the
inner body is allowed to yield _either_ (a) **zero** DATA frames, in which
case the body will be `.await`'ed and polled until the trailers are
obtained, or (b) **one** DATA frame, so long as the inner body
immediately yields a trailer.

meanwhile, the documentation comment for the type claims:

> An HTTP body that allows inspecting the body's trailers, if a
> `TRAILERS` frame was the first frame after the initial headers frame.

we won't have distinct `data()` and `trailers()` interfaces in the 1.0
release. we have a single
[`BodyExt::frame()`](https://docs.rs/http-body-util/latest/http_body_util/trait.BodyExt.html#method.frame)
method.

consequently, porting this middleware as-is would be somewhat difficult.
we might have to hold two frames, should we receive one frame,
`now_or_never()` the second frame, and discover that we've been provided
a second data frame rather than the trailers.

this all runs slightly against the invariants of `Body`, see this
comment originally added in 7f817b5:

```
// Peek to see if there's immediately a trailers frame, and grab
// it if so. Otherwise, bail.
// XXX(eliza): the documentation for the `http::Body` trait says
// that `poll_trailers` should only be called after `poll_data`
// returns `None`...but, in practice, I'm fairly sure that this just
// means that it *will not return `Ready`* until there are no data
// frames left, which is fine for us here, because we `now_or_never`
// it.
```

this isn't quite true, as `Trailers` is just a wrapper calling
`poll_trailers`:
<https://docs.rs/http-body/0.4.6/src/http_body/next.rs.html#28-30>

so, let's remove this. doing so will make the task of porting this
middleware to http-body 1.0 in the short term, and additionally prevents
any potential future misbehavior due to inner bodies not handling this
eager trailer polling gracefully.

see linkerd/linkerd2#8733.

see #3504.

Signed-off-by: katelyn martin <[email protected]>
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch 5 times, most recently from fa62779 to b41108e Compare January 14, 2025 18:30
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch 3 times, most recently from a5d21c8 to 5085d02 Compare January 22, 2025 16:09
@cratelyn
Copy link
Collaborator Author

rebased once more ♻️

cratelyn added a commit that referenced this pull request Jan 22, 2025
this vendors a copy of the `Frame<T>` type. we can use this to
preemptively model our peek body in terms of this type, and move to the
"real" version of it when we're upgrading in #3504.

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Jan 23, 2025
this commit introduces a `compat` submodule to `linkerd-http-retry`.

this helps us frontrun the task of replacing all of the finicky control
flow in `PeekTrailersBody<B>` using the antiquated `data()` and
`trailers()` future combinators. instead, we can perform our peeking
in terms of an approximation of `http_body_util::BodyExt::frame()`.

to accomplish this, this commit vendors a copy of the `Frame<T>` type.
we can use this to preemptively model our peek body in terms of this
type, and move to the "real" version of it when we're upgrading in
pr #3504.

additionally, this commit includes a type called
`ForwardCompatibleBody<B>`, and a variant of the `Frame<'a, T>`
combinator. these are a bit boilerplate-y, admittedly, but the pleasant
part of this is that we have, in effect, migrated the trickiest body
middleware in advance of #3504. once we upgrade to http-body 1.0, all of
these types can be removed.

https://docs.rs/http-body-util/latest/http_body_util/trait.BodyExt.html#method.frame
https://docs.rs/http-body-util/0.1.2/src/http_body_util/combinators/frame.rs.html#10

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Jan 23, 2025
this commit introduces a `compat` submodule to `linkerd-http-retry`.

this helps us frontrun the task of replacing all of the finicky control
flow in `PeekTrailersBody<B>` using the antiquated `data()` and
`trailers()` future combinators. instead, we can perform our peeking
in terms of an approximation of `http_body_util::BodyExt::frame()`.

to accomplish this, this commit vendors a copy of the `Frame<T>` type.
we can use this to preemptively model our peek body in terms of this
type, and move to the "real" version of it when we're upgrading in
pr #3504.

additionally, this commit includes a type called
`ForwardCompatibleBody<B>`, and a variant of the `Frame<'a, T>`
combinator. these are a bit boilerplate-y, admittedly, but the pleasant
part of this is that we have, in effect, migrated the trickiest body
middleware in advance of #3504. once we upgrade to http-body 1.0, all of
these types can be removed.

https://docs.rs/http-body-util/latest/http_body_util/trait.BodyExt.html#method.frame
https://docs.rs/http-body-util/0.1.2/src/http_body_util/combinators/frame.rs.html#10

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Jan 23, 2025
this commit reworks `PeekTrailersBody<B>`.

the most important goal here is replacing the control flow of
`read_body()`, which polls a body using `BodyExt` future combinators
`data()` and `frame()` for up to two frames, with varying levels of
persistence depending on outcomes.

to quote #3556:

> the intent of this type is to only yield the asynchronous task
> responsible for reading the body once. depending on what the inner
> body yields, and when, this can result in combinations of: no data
> frames and no trailers, no data frames with trailers, one data frame
> and no trailers, one data frame with trailers, or two data frames.
> moreover, depending on which of these are yielded, the body will call
> .await some scenarios, and only poll functions once in others.
>
> migrating this to the Frame<T> and poll_frame() style of the 1.0 Body
> interface, away from the 0.4 interface that provides distinct
> poll_data() and poll_trailers() methods, is fundamentally tricky.

this means that `PeekTrailersBody<B>` is notably difficult to port
across the http-body 0.4/1.0 upgrade boundary.

this body middleware must navigate a number of edge conditions, and once
it _has_ obtained a `Frame<T>`, make use of conversion methods to
ascertain whether it is a data or trailers frame, due to the fact that
its internal enum representation is not exposed publicly. one it has
done all of that, it must do the same thing once more to examine the
second frame.

this commit uses the compatibility facilities and backported `Frame<T>`
introduced in the previous commit, and rewrites this control flow using
a form of the `BodyExt::frame()` combinator.

this means that this middleware is forward-compatible with http-body
1.0, which will dramatically simplify the remaining migration work to be
done in #3504.

see linkerd/linkerd2#8733 for more information
and other links related to this ongoing migration work.

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Jan 24, 2025
this branch contains a sequence of commits that refactor `PeekTrailersBody<B>`.

this branch is specifically focused on making this body middleware
forward-compatible with the 1.0 interface(s) of `http_body::Body` and
`http_body_util::BodyExt`.

it does this in two main steps: (1) temporarily vendoring `http_body::Frame<T>`
and providing a compatibility shim that provides a `frame()` method for a body,
and (2) modeling `PeekTrailersBody<B>` and its peeking logic in terms of this
`Frame<'a, T>` future.

---

* feat(http/retry): add `Frame<T>` compatibility facilities

this commit introduces a `compat` submodule to `linkerd-http-retry`.

this helps us frontrun the task of replacing all of the finicky control
flow in `PeekTrailersBody<B>` using the antiquated `data()` and
`trailers()` future combinators. instead, we can perform our peeking
in terms of an approximation of `http_body_util::BodyExt::frame()`.

to accomplish this, this commit vendors a copy of the `Frame<T>` type.
we can use this to preemptively model our peek body in terms of this
type, and move to the "real" version of it when we're upgrading in
pr #3504.

additionally, this commit includes a type called
`ForwardCompatibleBody<B>`, and a variant of the `Frame<'a, T>`
combinator. these are a bit boilerplate-y, admittedly, but the pleasant
part of this is that we have, in effect, migrated the trickiest body
middleware in advance of #3504. once we upgrade to http-body 1.0, all of
these types can be removed.

https://docs.rs/http-body-util/latest/http_body_util/trait.BodyExt.html#method.frame
https://docs.rs/http-body-util/0.1.2/src/http_body_util/combinators/frame.rs.html#10

Signed-off-by: katelyn martin <[email protected]>

* refactor(http/retry): `PeekTrailersBody<B>` uses `BodyExt::frame()`

this commit reworks `PeekTrailersBody<B>`.

the most important goal here is replacing the control flow of
`read_body()`, which polls a body using `BodyExt` future combinators
`data()` and `frame()` for up to two frames, with varying levels of
persistence depending on outcomes.

to quote #3556:

> the intent of this type is to only yield the asynchronous task
> responsible for reading the body once. depending on what the inner
> body yields, and when, this can result in combinations of: no data
> frames and no trailers, no data frames with trailers, one data frame
> and no trailers, one data frame with trailers, or two data frames.
> moreover, depending on which of these are yielded, the body will call
> .await some scenarios, and only poll functions once in others.
>
> migrating this to the Frame<T> and poll_frame() style of the 1.0 Body
> interface, away from the 0.4 interface that provides distinct
> poll_data() and poll_trailers() methods, is fundamentally tricky.

this means that `PeekTrailersBody<B>` is notably difficult to port
across the http-body 0.4/1.0 upgrade boundary.

this body middleware must navigate a number of edge conditions, and once
it _has_ obtained a `Frame<T>`, make use of conversion methods to
ascertain whether it is a data or trailers frame, due to the fact that
its internal enum representation is not exposed publicly. one it has
done all of that, it must do the same thing once more to examine the
second frame.

this commit uses the compatibility facilities and backported `Frame<T>`
introduced in the previous commit, and rewrites this control flow using
a form of the `BodyExt::frame()` combinator.

this means that this middleware is forward-compatible with http-body
1.0, which will dramatically simplify the remaining migration work to be
done in #3504.

see linkerd/linkerd2#8733 for more information
and other links related to this ongoing migration work.

Signed-off-by: katelyn martin <[email protected]>

* refactor(http/retry): mock body enforces `poll_trailers()` contract

this commit addresses a `TODO` note, and tightens the enforcement of a
rule defined by the v0.4 signature of the `Body` trait.

this commit changes the mock body type, used in tests, so that it will
panic if the caller improperly polls for a trailers frame before the
final data frame has been yielded.

previously, a comment indicated that we were "fairly sure" this was
okay. while that may have been acceptable in practice, the changes in
the previous commit mean that we now properly respect these rules.

thus, a panic can be placed here, to enforce that "[is] only be called
once `poll_data()` returns `None`", per the documentation.

Signed-off-by: katelyn martin <[email protected]>

* refactor(http/retry): rename `PeekTrailersBody::Buffered`

<#3559 (comment)>

this is a nicer name than `Unknown` for this case. not to mention, we'll
want that name shortly to address the possibility of unknown frame
variants.

Signed-off-by: katelyn martin <[email protected]>

* refactor(http/retry): encapsulate `Inner<B>` enum variants

this commit makes the inner enum variants private.

#3559 (comment)

Signed-off-by: katelyn martin <[email protected]>

* refactor(http/retry): gracefully ignore unknown frames

#3559 (comment)

Signed-off-by: katelyn martin <[email protected]>

---------

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Jan 24, 2025
this is a refactor commit, outlined from #3504.

this commit is particularly interested in preparing the
`PeekTrailersBody<B>` middleware for our upgrade to hyper/http-body 1.0.
more specifically: this commit clears up the boundary concerning when it
will or will not become inert and delegate to its inner body `B`.

currently, a `PeekTrailersBody<B>` is not fully consistent about the
conditions in which it will peek the trailers of a response body: the
inner body is allowed to yield _either_ (a) **zero** DATA frames, in which
case the body will be `.await`'ed and polled until the trailers are
obtained, or (b) **one** DATA frame, so long as the inner body
immediately yields a trailer.

see linkerd/linkerd2#8733.

Signed-off-by: katelyn martin <[email protected]>
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch from 5085d02 to b58ab0b Compare January 24, 2025 20:56
@cratelyn
Copy link
Collaborator Author

and rebased again, now that #3556, #3558, and #3559 have landed.

cratelyn added a commit that referenced this pull request Jan 27, 2025
see linkerd/linkerd2#8733.

pr #3559 introduced some compatibility facilities to allow us to write
code in terms of `http_body_util::BodyExt::frame()`, front-running the
upgrade to be performed in #3504.

some `ReplayBody` tests use the defunct `data()` and `trailers()`
interfaces. this branch ports _two_ such unit tests. other tests are
saved for a fast follow-on, as the `chunk(..)` and `read_to_string(..)`
helpers will need some slightly more involved tweaks.

dd4fbcd

---

* refactor(http/retry): `replays_trailers()` uses `Frame<T>`

see linkerd/linkerd2#8733.

this commit upgrades a test that uses defunct `data()` and `trailers()`
futures.

Signed-off-by: katelyn martin <[email protected]>

* refactor(http/retry): `trailers_only()` uses `Frame<T>`

see linkerd/linkerd2#8733.

this commit upgrades a test that uses defunct `data()` and `trailers()`
futures.

Signed-off-by: katelyn martin <[email protected]>

---------

Signed-off-by: katelyn martin <[email protected]>
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch from b58ab0b to d1cafce Compare January 27, 2025 22:05
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch 3 times, most recently from cc70c37 to 2f0fd0e Compare February 10, 2025 23:56
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch from 2f0fd0e to 60ab0cf Compare February 14, 2025 20:32
cratelyn added 20 commits March 18, 2025 18:46
Signed-off-by: katelyn martin <[email protected]>
this commit updates the contents of `deny.toml`.

Signed-off-by: katelyn martin <[email protected]>
this addresses deprecation warnings, updating calls to a function that
has since been renamed.

Signed-off-by: katelyn martin <[email protected]>
this commit removes this crate, which we added to future proof code for
this upgrade, from its dependents.

Signed-off-by: katelyn martin <[email protected]>
@cratelyn cratelyn force-pushed the kate/hyper-1.x-upgrade branch from 77a657e to aecfc7b Compare March 18, 2025 22:46
@cratelyn
Copy link
Collaborator Author

this has been rebased atop #3779.

cratelyn added a commit that referenced this pull request Mar 18, 2025
this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <[email protected]>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819
@cratelyn cratelyn merged commit 76d4181 into main Mar 21, 2025
29 checks passed
@cratelyn cratelyn deleted the kate/hyper-1.x-upgrade branch March 21, 2025 16:53
cratelyn added a commit that referenced this pull request Mar 21, 2025
this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <[email protected]>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819
cratelyn added a commit that referenced this pull request Mar 21, 2025
* chore(deps)!: upgrade to tower 0.5

this commit updates our tower dependency from 0.4 to 0.5.

note that this commit does not affect the `tower-service` and
`tower-layer` crates, reëxported by `tower` itself. the `Service<T>`
trait and the closely related `Layer<S>` trait have not been changed.

the `tower` crate's utilities have changed in various ways, some of
particular note for the linkerd2 proxy. see these items, excerpted from
the tower changelog:

- **retry**: **Breaking Change** `retry::Policy::retry` now accepts `&mut Req` and `&mut Res` instead of the previous mutable versions. This
  increases the flexibility of the retry policy. To update, update your method signature to include `mut` for both parameters. ([tower-rs/tower#584])
- **retry**: **Breaking Change** Change Policy to accept &mut self ([tower-rs/tower#681])
- **retry**: **Breaking Change** `Budget` is now a trait. This allows end-users to implement their own budget and bucket implementations. ([tower-rs/tower#703])
- **util**: **Breaking Change** `Either::A` and `Either::B` have been renamed `Either::Left` and `Either::Right`, respectively. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` now requires its two services to have the same error type. ([tower-rs/tower#637])
- **util**: **Breaking Change** `Either` no longer implemenmts `Future`. ([tower-rs/tower#637])
- **buffer**: **Breaking Change** `Buffer<S, Request>` is now generic over `Buffer<Request, S::Future>.` ([tower-rs/tower#654])

see:

* <tower-rs/tower#584>
* <tower-rs/tower#681>
* <tower-rs/tower#703>
* <tower-rs/tower#637>
* <tower-rs/tower#654>

the `Either` trait bounds are particularly impactful for us. because
this runs counter to how we treat errors (skewing towards boxed errors,
in general), we temporarily vendor a version of `Either` from the 0.4
release, whose variants have been renamed to match the 0.5 interface.

updating to box the inner `A` and `B` services' errors, so we satiate
the new `A::Error = B::Error` bounds, can be addressed as a follow-on.
that's intentionally left as a separate change, due to the net size of
our patchset between this branch and #3504.

* <tower-rs/tower@v0.4.x...master>
* <https://github.com/tower-rs/tower/blob/master/tower/CHANGELOG.md>

this work is based upon #3504. for more information, see:

* linkerd/linkerd2#8733
* #3504

Signed-off-by: katelyn martin <[email protected]>
X-Ref: tower-rs/tower#815
X-Ref: tower-rs/tower#817
X-Ref: tower-rs/tower#818
X-Ref: tower-rs/tower#819

* fix(stack/loadshed): update test affected by tower-rs/tower#635

this commit updates a test that was affected by breaking changes in
tower's `Buffer` middleware. see this excerpt from the description of
that change:

> I had to change some of the integration tests slightly as part of this
> change. This is because the buffer implementation using semaphore
> permits is _very subtly_ different from one using a bounded channel. In
> the `Semaphore`-based implementation, a semaphore permit is stored in
> the `Message` struct sent over the channel. This is so that the capacity
> is used as long as the message is in flight. However, when the worker
> task is processing a message that's been recieved from the channel,
> the permit is still not dropped. Essentially, the one message actively
> held by the worker task _also_ occupies one "slot" of capacity, so the
> actual channel capacity is one less than the value passed to the
> constructor, _once the first request has been sent to the worker_. The
> bounded MPSC changed this behavior so that capacity is only occupied
> while a request is actually in the channel, which broke some tests
> that relied on the old (and technically wrong) behavior.

bear particular attention to this:

> The bounded MPSC changed this behavior so that capacity is only
> occupied while a request is actually in the channel, which broke some
> tests that relied on the old (and technically wrong) behavior.

that pr adds an additional message to the channel in tests exercising
the laod-shedding behavior, on account of the previous (incorrect)
behavior.

https://github.com/tower-rs/tower/pull/635/files#r797108274

this commit performs the same change for our corresponding test, adding
an additional `ready()` call before we hit the buffer's limit.

Signed-off-by: katelyn martin <[email protected]>

* review: use vendored `Either` for consistency

#3744 (comment)

Signed-off-by: katelyn martin <[email protected]>

---------

Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Apr 9, 2025
this commit fixes a bug discovered by @alpeb, which was introduced in
proxy v2.288.0.

> The associated metric is `outbound_http_route_request_statuses_total`:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error="UNKNOWN"} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error="UNKNOWN"} 10
> ```
>
> The problem was introduced in `edge-25.3.4`, with the proxy `v2.288.0`.
> Before that the metrics looked like:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error=""} 193
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 96
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error=""} 96
> ```
>
> So the difference is the non-empty value for `error=UNKNOWN` even
> when `https_status` is 2xx, which `linkerd viz stat-outbound`
> interprets as failed requests.

in #3086 we introduced a suite of route- and backend-level metrics. that
subsystem contains a body middleware that will report itself as having
reached the end-of-stream by delegating directly down to its inner
body's `is_end_stream()` hint.

this is roughly correct, but is slightly distinct from the actual
invariant: a `linkerd_http_prom::record_response::ResponseBody<B>` must
call its `end_stream` helper to classify the outcome and increment the
corresponding time series in the
`outbound_http_route_request_statuses_total` metric family.

in #3504 we upgraded our hyper dependency. while doing so, we neglected
to include a call to `end_stream` if a data frame is yielded and the
inner body reports itself as having reached the end-of-stream.

this meant that instrumented bodies would be polled until the end is
reached, but were being dropped before a `None` was encountered.

this commit fixes this issue in two ways, to be defensive:

* invoke `end_stream()` if a non-trailers frame is yielded, and the
  inner body now reports itself as having ended. this restores the
  behavior in place prior to #3504. see the relevant component of that
  diff, here:
  <https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274>

* rather than delegating to the inner `<B as Body>::is_end_stream()`
  method, report the end-of-stream being reached by inspecting whether
  or not the inner response state has been taken. this is the state that
  directly indicates whether or not the `ResponseBody<B>` middleware is
  finished.

X-ref: #3504
X-ref: #3086
X-ref: linkerd/linkerd2#8733
Signed-off-by: katelyn martin <[email protected]>
cratelyn added a commit that referenced this pull request Apr 9, 2025
* chore(app/outbound): `linkerd-mock-http-body` test dependency

this adds a development dependency, so we can use this mock body type in
the outbound proxy's unit tests.

Signed-off-by: katelyn martin <[email protected]>

* chore(app/outbound): additional http route metrics tests

Signed-off-by: katelyn martin <[email protected]>

* chore(app/outbound): additional grpc route metrics tests

Signed-off-by: katelyn martin <[email protected]>

* fix(http/prom): record bodies when eos reached

this commit fixes a bug discovered by @alpeb, which was introduced in
proxy v2.288.0.

> The associated metric is `outbound_http_route_request_statuses_total`:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error="UNKNOWN"} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error="UNKNOWN"} 10
> ```
>
> The problem was introduced in `edge-25.3.4`, with the proxy `v2.288.0`.
> Before that the metrics looked like:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error=""} 193
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 96
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error=""} 96
> ```
>
> So the difference is the non-empty value for `error=UNKNOWN` even
> when `https_status` is 2xx, which `linkerd viz stat-outbound`
> interprets as failed requests.

in #3086 we introduced a suite of route- and backend-level metrics. that
subsystem contains a body middleware that will report itself as having
reached the end-of-stream by delegating directly down to its inner
body's `is_end_stream()` hint.

this is roughly correct, but is slightly distinct from the actual
invariant: a `linkerd_http_prom::record_response::ResponseBody<B>` must
call its `end_stream` helper to classify the outcome and increment the
corresponding time series in the
`outbound_http_route_request_statuses_total` metric family.

in #3504 we upgraded our hyper dependency. while doing so, we neglected
to include a call to `end_stream` if a data frame is yielded and the
inner body reports itself as having reached the end-of-stream.

this meant that instrumented bodies would be polled until the end is
reached, but were being dropped before a `None` was encountered.

this commit fixes this issue in two ways, to be defensive:

* invoke `end_stream()` if a non-trailers frame is yielded, and the
  inner body now reports itself as having ended. this restores the
  behavior in place prior to #3504. see the relevant component of that
  diff, here:
  <https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274>

* rather than delegating to the inner `<B as Body>::is_end_stream()`
  method, report the end-of-stream being reached by inspecting whether
  or not the inner response state has been taken. this is the state that
  directly indicates whether or not the `ResponseBody<B>` middleware is
  finished.

X-ref: #3504
X-ref: #3086
X-ref: linkerd/linkerd2#8733
Signed-off-by: katelyn martin <[email protected]>

---------

Signed-off-by: katelyn martin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants