Skip to content

feat(kad): configurable outbound_substreams timeout #6015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 6, 2025

Conversation

maqi
Copy link
Contributor

@maqi maqi commented May 5, 2025

Description

When testing put_record_to with poor connectioin but large packet size (around 4MB), the put_record_to frequently got failed and the PutRecordToError::stats showing always failed at 10s.

"kad_event::PutRecordError::QuorumFailed", required quorum 2, stored on [], QueryStats { requests: 5, success: 0, failure: 5, start: Some(Instant { tv_sec: 121323, tv_nsec: 961187829 }), end: Some(Instant { tv_sec: 121334, tv_nsec: 188694869 }) } - ProgressStep { count: 1, last: true }

However, this timeout is not defined by request/response protocol timeout (which in our case increased to 30s already).

Further digging through the libp2p code, it turned out this timeout is actually affected by outbound_substreams timeout.
We tested with increase to 30s, do see much less put_record_to failure and QueryStats showing extended completion duration (larger than 10s, and in case of failed, to be 30s).

Hence raised this PR to provide a configurable approach of the timeout.

Notes & open questions

Ideally, the configurate shall via function call.
However, it seems it will be via the kad::ProtocolConfig , which could having issue of:

  • extra traffic bytes (if ProtocolConfig to be exchanged on every connection)
  • backward conpartibility, in case peer is in old version and can't parse the new ProtocolConfig

Would like a guide if there is other way to do this configure setup more user friendly.

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • A changelog entry has been made in the appropriate crates

@maqi maqi force-pushed the configurable_outbound_streams_timeout branch from 5fdd136 to 171b7af Compare May 5, 2025 14:03
@elenaf9
Copy link
Member

elenaf9 commented May 5, 2025

Thank you for the PR @maqi.

Ideally, the configurate shall via function call.
However, it seems it will be via the kad::ProtocolConfig , which could having issue of:

  • extra traffic bytes (if ProtocolConfig to be exchanged on every connection)
  • backward conpartibility, in case peer is in old version and can't partse the new ProtocolConfig

Would like a guide if there is other way to do this configure setup more user friendly.

kad::ProtocolConfig is just a local config, it is never sent to remote peers (only the protocol_names property is).
So we can actually just add outbound_substream_timeout to ProtocolConfig.

@maqi maqi force-pushed the configurable_outbound_streams_timeout branch from 171b7af to 91f3eff Compare May 6, 2025 09:56
@maqi maqi changed the title feat(kad): configurable OUTBOUND_SUBSTREAMS_TIMEOUT feat(kad): configurable outbound_substreams timeout May 6, 2025
@maqi
Copy link
Contributor Author

maqi commented May 6, 2025

So we can actually just add outbound_substream_timeout to ProtocolConfig.

Hi, @elenaf9,

Thx for the comment and clarification.
Got PR updated to add outbound_substream_timeout into ProtocolConfig, which allows it to be modified via function calls.

@maqi maqi force-pushed the configurable_outbound_streams_timeout branch from 91f3eff to 877b7b4 Compare May 6, 2025 09:59
@maqi maqi force-pushed the configurable_outbound_streams_timeout branch 2 times, most recently from ff0435c to 52273cd Compare May 6, 2025 10:58
Copy link
Member

@elenaf9 elenaf9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, one more comment that I missed in the last review. Rest LGTM, thanks!

@maqi maqi force-pushed the configurable_outbound_streams_timeout branch from 52273cd to 9735801 Compare May 6, 2025 12:16
@elenaf9 elenaf9 added the send-it label May 6, 2025
@mergify mergify bot merged commit a367e1b into libp2p:master May 6, 2025
71 checks passed
maqi added a commit to maqi/rust-libp2p that referenced this pull request May 8, 2025
Then testing `put_record_to` with poor connectioin but large packet size (around 4MB), the `put_record_to` frequently got failed and the PutRecordToError::stats showing always failed at 10s.
```
"kad_event::PutRecordError::QuorumFailed", required quorum 2, stored on [], QueryStats { requests: 5, success: 0, failure: 5, start: Some(Instant { tv_sec: 121323, tv_nsec: 961187829 }), end: Some(Instant { tv_sec: 121334, tv_nsec: 188694869 }) } - ProgressStep { count: 1, last: true }
```
However, this timeout is not defined by request/response protocol timeout (which in our case increased to 30s already).

Further digging through the libp2p code, it turned out this timeout is actually affected by outbound_substreams timeout.
We tested with increase to 30s, do see much less put_record_to failure and QueryStats showing extended completion duration (larger than 10s, and in case of failed, to be 30s).

Hence raised this PR to provide a configurable approach of the timeout.

Pull-Request: libp2p#6015.
mergify bot pushed a commit that referenced this pull request Jun 26, 2025
#6015 made the timeout for outbound substreams in kademlia configurable by introducing a `outbound_substreams_timeout` config option.
There is currently an open PR #6009 that plans to also apply that timeout to inbound substream, in which case we probably want toe rename the config option to `substreams_timeout` (without the prefix).
Because we plan to release soon and might not get #6009 in in-time, I propose that we already do the renaming now, to avoid breaking the API again with the next release.

Pull-Request: #6076.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants