You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(http/prom): record bodies when eos reached (#3856)
* chore(app/outbound): `linkerd-mock-http-body` test dependency
this adds a development dependency, so we can use this mock body type in
the outbound proxy's unit tests.
Signed-off-by: katelyn martin <[email protected]>
* chore(app/outbound): additional http route metrics tests
Signed-off-by: katelyn martin <[email protected]>
* chore(app/outbound): additional grpc route metrics tests
Signed-off-by: katelyn martin <[email protected]>
* fix(http/prom): record bodies when eos reached
this commit fixes a bug discovered by @alpeb, which was introduced in
proxy v2.288.0.
> The associated metric is `outbound_http_route_request_statuses_total`:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error="UNKNOWN"} 5
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error="UNKNOWN"} 10
> ```
>
> The problem was introduced in `edge-25.3.4`, with the proxy `v2.288.0`.
> Before that the metrics looked like:
>
> ```
> $ linkerd dg proxy-metrics -n booksapp deploy/webapp|rg outbound_http_route_request_statuses_total.*authors
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="200",error=""} 193
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="204",error=""} 96
> outbound_http_route_request_statuses_total{parent_group="core",parent_kind="Service",parent_namespace="booksapp",parent_name="authors",parent_port="7001",parent_section_name="",route_group="",route_kind="default",route_namespace="",route_name="http",hostname="",http_status="201",error=""} 96
> ```
>
> So the difference is the non-empty value for `error=UNKNOWN` even
> when `https_status` is 2xx, which `linkerd viz stat-outbound`
> interprets as failed requests.
in #3086 we introduced a suite of route- and backend-level metrics. that
subsystem contains a body middleware that will report itself as having
reached the end-of-stream by delegating directly down to its inner
body's `is_end_stream()` hint.
this is roughly correct, but is slightly distinct from the actual
invariant: a `linkerd_http_prom::record_response::ResponseBody<B>` must
call its `end_stream` helper to classify the outcome and increment the
corresponding time series in the
`outbound_http_route_request_statuses_total` metric family.
in #3504 we upgraded our hyper dependency. while doing so, we neglected
to include a call to `end_stream` if a data frame is yielded and the
inner body reports itself as having reached the end-of-stream.
this meant that instrumented bodies would be polled until the end is
reached, but were being dropped before a `None` was encountered.
this commit fixes this issue in two ways, to be defensive:
* invoke `end_stream()` if a non-trailers frame is yielded, and the
inner body now reports itself as having ended. this restores the
behavior in place prior to #3504. see the relevant component of that
diff, here:
<https://github.com/linkerd/linkerd2-proxy/pull/3504/files#diff-45d0bc344f76c111551a8eaf5d3f0e0c22ee6e6836a626e46402a6ae3cbc0035L262-R274>
* rather than delegating to the inner `<B as Body>::is_end_stream()`
method, report the end-of-stream being reached by inspecting whether
or not the inner response state has been taken. this is the state that
directly indicates whether or not the `ResponseBody<B>` middleware is
finished.
X-ref: #3504
X-ref: #3086
X-ref: linkerd/linkerd2#8733
Signed-off-by: katelyn martin <[email protected]>
---------
Signed-off-by: katelyn martin <[email protected]>
0 commit comments