Frontend batching by joe-elliott · Pull Request #2677 · grafana/tempo

joe-elliott · 2023-07-19T20:33:50Z

What this PR does:
Batches jobs in the requests from the query-frontend queue to the queriers. Previously, the frontend would send each job 1 at at time with an individual http request. This PR adds a configurable parameter to allow the frontend to send more than one request at once.

Other changes:

Docs of course! Including an update to the search performance tuning doc with some more current information.
Adds a new histogram metric tempo_query_frontend_actual_batch_size to track the actual size of the batches being farmed to the queriers
Better testing of the queues and frontend worker.
Added the ability for the querier to signal to the frontend the features it supports for seamless rollouts.

Performance testing
The goal with the setup was to create a cluster that could execute the 36k jobs created by the test query simultaneously. This way job throughput from frontend -> querier could be tested more directly.

80 queriers
500 jobs per querier
Total cluster capacity 40k jobs
No reliance on serverless

Results

batch size    overall query latency     p99 job time in queue
1             8.5s                      4.9s
2             7.6s                      2.4s
5             6.7s                      1s
10            9s                        4.4s
1*            9.6s                      9s

*current image

The overall latency of queries where total jobs > total cluster capacity was not as impressively reduced, but this is a good step in the right direction.

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Signed-off-by: Joe Elliott <number101010@gmail.com>

mdisibio

This looks great and I like the way it is controlled by querier features. A few small q's, but none blocking and will go ahead and approve.

Co-authored-by: Martin Disibio <mdisibio@gmail.com>

zalegrala

This looks pretty good to me, nice improvement. Will be interesting to see the results on the dashboard. Had a question about the context handling but not blocking.

zalegrala · 2023-07-20T17:49:29Z

-		// then error out this upstream request _and_ stream.
-		case err := <-errs:
-			req.err <- err
+		err = reportResponseUpstream(reqBatch, errs, resps)


Do we have a context to pass? Wondering if it might simplify the context handling below.

If the streaming GRPC server connection itself drops or context is cancelled then the .Send() returns an error and this case is hit:
https://github.com/grafana/tempo/pull/2677/files#diff-0914703aed52090bd72851004df203444207d9d48677c10860b0459afef1a0b9R311

If the request is cancelled upstream then this case is hit:
https://github.com/grafana/tempo/pull/2677/files#diff-0914703aed52090bd72851004df203444207d9d48677c10860b0459afef1a0b9R304

If the requests are cancelled downstream then we get an http response and this case is hit:
https://github.com/grafana/tempo/pull/2677/files#diff-0914703aed52090bd72851004df203444207d9d48677c10860b0459afef1a0b9R304

I think everything is covered.

Signed-off-by: Joe Elliott <number101010@gmail.com>

joe-elliott added 15 commits July 17, 2023 09:13

move queue

75d6cae

Signed-off-by: Joe Elliott <number101010@gmail.com>

remove unnecessary loop

55f9a79

Signed-off-by: Joe Elliott <number101010@gmail.com>

added mimir fix

6daf93d

Signed-off-by: Joe Elliott <number101010@gmail.com>

add batching ability

a82b122

Signed-off-by: Joe Elliott <number101010@gmail.com>

RLock() shenanigans

5214fce

Signed-off-by: Joe Elliott <number101010@gmail.com>

fix shenanigans

885836d

Signed-off-by: Joe Elliott <number101010@gmail.com>

make batch size configurable

dc5a2a9

Signed-off-by: Joe Elliott <number101010@gmail.com>

remove unused param

bc52009

Signed-off-by: Joe Elliott <number101010@gmail.com>

comments/cleanup

8d6735d

Signed-off-by: Joe Elliott <number101010@gmail.com>

added request buffer

0cb8d51

Signed-off-by: Joe Elliott <number101010@gmail.com>

request batch tests

f79734a

Signed-off-by: Joe Elliott <number101010@gmail.com>

todos

71484c5

Signed-off-by: Joe Elliott <number101010@gmail.com>

todos + testing

f3a1350

Signed-off-by: Joe Elliott <number101010@gmail.com>

docs updates

58fbdb0

Signed-off-by: Joe Elliott <number101010@gmail.com>

Merge branch 'main' into frontend-batching

a1923e2

joe-elliott requested review from annanay25, electron0zero, ie-pham, knylander-grafana, mapno, mdisibio, stoewer, yvrhdn and zalegrala as code owners July 19, 2023 20:33

joe-elliott added 3 commits July 19, 2023 16:34

todos

528b941

Signed-off-by: Joe Elliott <number101010@gmail.com>

lint

66efe0a

Signed-off-by: Joe Elliott <number101010@gmail.com>

changelog

cb84276

Signed-off-by: Joe Elliott <number101010@gmail.com>

mdisibio approved these changes Jul 20, 2023

View reviewed changes

Comment thread modules/frontend/queue/queue.go Outdated

Comment thread modules/frontend/v1/request_batch.go

Comment thread modules/querier/worker/frontend_processor.go Outdated

Comment thread modules/frontend/v1/frontend.go

Comment thread modules/frontend/v1/frontend.go Outdated

Update modules/frontend/queue/queue.go

3dc42ac

Co-authored-by: Martin Disibio <mdisibio@gmail.com>

zalegrala approved these changes Jul 20, 2023

View reviewed changes

joe-elliott added 3 commits July 20, 2023 14:30

review comments

2797431

Signed-off-by: Joe Elliott <number101010@gmail.com>

use constant. wuss out on default batch size

e72cd2d

Signed-off-by: Joe Elliott <number101010@gmail.com>

lint

839c347

Signed-off-by: Joe Elliott <number101010@gmail.com>

joe-elliott merged commit 306c6ca into grafana:main Jul 20, 2023

joe-elliott mentioned this pull request Jul 25, 2023

[Search Perf] Improve Query Frontend -> Querier Job Throughput #2464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontend batching#2677

Frontend batching#2677
joe-elliott merged 22 commits intografana:mainfrom
joe-elliott:frontend-batching

joe-elliott commented Jul 19, 2023 •

edited

Loading

Uh oh!

mdisibio left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zalegrala left a comment

Uh oh!

zalegrala Jul 20, 2023

Uh oh!

joe-elliott Jul 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

joe-elliott commented Jul 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mdisibio left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zalegrala left a comment

Choose a reason for hiding this comment

Uh oh!

zalegrala Jul 20, 2023

Choose a reason for hiding this comment

Uh oh!

joe-elliott Jul 20, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

joe-elliott commented Jul 19, 2023 •

edited

Loading