Skip to content

Reduce impact of backendRequests on latency#2530

Merged
joe-elliott merged 7 commits intografana:mainfrom
joe-elliott:jackie-channels
Jun 2, 2023
Merged

Reduce impact of backendRequests on latency#2530
joe-elliott merged 7 commits intografana:mainfrom
joe-elliott:jackie-channels

Conversation

@joe-elliott
Copy link
Copy Markdown
Collaborator

@joe-elliott joe-elliott commented Jun 1, 2023

What this PR does:
Uses a channel to return jobs instead of a returning them as a slice from backendRequests. This nicely improves performance for queries that create a huge number of jobs.

Other Changes

  • Fixes a bug in searchProgress.internalShouldQuit() where we needed 1 more than the limit to quit
  • Switches totalBlockBytes to be a uint64 throughout

Benchmarks:

name                           old time/op    new time/op    delta
SearchSharderRoundTrip5-8         181ms ± 1%       1ms ± 9%  -99.54%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       184ms ± 1%      12ms ± 1%  -93.27%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     318ms ± 4%     472ms ± 2%  +48.65%  (p=0.008 n=5+5)

name                           old alloc/op   new alloc/op   delta
SearchSharderRoundTrip5-8         118MB ± 0%       0MB ± 0%  -99.62%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       120MB ± 0%       5MB ± 0%  -95.99%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     176MB ± 0%     176MB ± 0%   -0.10%  (p=0.008 n=5+5)

name                           old allocs/op  new allocs/op  delta
SearchSharderRoundTrip5-8         1.15M ± 0%     0.00M ± 2%  -99.88%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       1.17M ± 0%     0.06M ± 0%  -95.12%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     2.24M ± 0%     2.26M ± 0%   +0.89%  (p=0.008 n=5+5)

Impact on exhaustive search with 100k jobs:
image

Which issue(s) this PR fixes:
Fixes #2469

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
Signed-off-by: Joe Elliott <number101010@gmail.com>
reqs = append(reqs, subR)

select {
case reqCh <- &backendReqMsg{req: subR}:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SearchSharderRoundTrip50000-8     318ms ± 4%     472ms ± 2%  +48.65% 

One thought on this is we could reduce the channel overhead by sending batched requests instead of individually. Looking at the code the easiest split is probably all jobs for a block in one channel send here.

Copy link
Copy Markdown
Collaborator Author

@joe-elliott joe-elliott Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive attempt was worse. before = this PR, after = this PR with batching as suggested:

name                           old time/op    new time/op    delta
SearchSharderRoundTrip5-8         659µs ± 1%    1108µs ±83%   +68.15%  (p=0.016 n=4+5)
SearchSharderRoundTrip500-8      12.2ms ± 4%    12.9ms ± 6%      ~     (p=0.056 n=5+5)
SearchSharderRoundTrip50000-8     474ms ±11%     540ms ± 8%   +13.91%  (p=0.032 n=5+5)

name                           old alloc/op   new alloc/op   delta
SearchSharderRoundTrip5-8         451kB ± 0%     588kB ±54%   +30.19%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8      4.80MB ± 0%    4.96MB ± 0%    +3.43%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     176MB ± 0%     181MB ± 0%    +3.03%  (p=0.016 n=5+4)

name                           old allocs/op  new allocs/op  delta
SearchSharderRoundTrip5-8         1.33k ± 2%    2.98k ±133%  +123.52%  (p=0.008 n=5+5)
SearchSharderRoundTrip500-8       57.2k ± 0%     58.0k ± 0%    +1.24%  (p=0.008 n=5+5)
SearchSharderRoundTrip50000-8     2.26M ± 0%     2.28M ± 0%    +0.88%  (p=0.008 n=5+5)

I think the additional memory management offsets it. Personally, i'm not concerned about that +40%. Even in that case the overall performance is going to be significantly better b/c we're getting jobs to queriers faster.

That first benchmark SearchSharderRoundTrip5 is the most interesting b/c it roughly represents "time to first job" which is the real improvement here.

Copy link
Copy Markdown
Contributor

@zalegrala zalegrala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Nice changes. I think @mdisibio has an interesting idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Search Perf] Improve throughput of backendRequests

3 participants