Skip to content

Conversation

@rittme
Copy link
Contributor

@rittme rittme commented Nov 10, 2021

Problem

Add two new TCP probes:

  • A probe to measure Tail Loss Probes (TLP) sent.
  • A probe to measure TCP Retransmit timeouts (RTO), reflecting how congested the network is.

Solution

Added two BPF kernel probes to count TLP and RTO events.

Result

Two new metrics introduced to TCP sampler:

  • tcp/ tlp
  • tcp/retransmit_timeout

Copy link
Contributor

@WUMUXIAN WUMUXIAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the tlp one looks correct, but I think the RTO one needs to change to another kernel probe.

Copy link
Contributor

@WUMUXIAN WUMUXIAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me now.

Copy link
Contributor

@brayniac brayniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the metric name, this looks good. I think it might be better overall to keep grouping metrics that are on the TX path together.

@brayniac brayniac merged commit 9bdf951 into twitter:master Nov 16, 2021
WUMUXIAN pushed a commit to WUMUXIAN/rezolus that referenced this pull request Nov 17, 2021
* Add TCP Tail Loss Recovery Probes sampler via BPF
* Add TCP Retransmit timeout sampler via BPF
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants