WIP: Blocks partial results by mapno · Pull Request #988 · grafana/tempo

mapno · 2021-09-28T14:51:09Z

What this PR does:

It supports partial results when block queries fail.

The general idea is to let queriers fail when querying blocks and decide the query frontend if the errors are too many or if the partial result is worth returning.

Partial results are indicated with http 206 (partial content).

Which issue(s) this PR fixes:
Fixes #899

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

Return slice of block errors

yvrhdn

Nice work already! Implementing partial results is pretty tricky it seems and it touches a lot of code 😅

A general concern: passing the amount of blocks that failed seems similar in intention to the SearchMetrics, that is: we want to capture some statistics about what happened in the ingester or querier and pass it all the way up.
Maybe instead of a uint32 blockErrCount we could work with a struct that is specific for custom metrics about the query. In the future we could also add the amount of blocks we scanned and the GBs or whatever.

yvrhdn · 2021-09-30T14:24:50Z

+	var shardMissCount, totalBlockErrCount int
 	for _, rr := range rrs {
-		if rr.Response.StatusCode == http.StatusOK {
+		partialContent := rr.Response.StatusCode == http.StatusPartialContent


Nit: renaming this to isPartialContent or gotPartialContent might make it more obvious this is a boolean. For a while I thought this was a variable holding a part of the content.

yvrhdn · 2021-09-30T15:46:22Z

-	var combinedTrace []byte
-	var shardMissCount = 0
+	var combinedTrace *tempopb.Trace
+	var combinedTraceBytes []byte


Nit: we don't use combinedTraceBytes in this for-loop yet, so we can move the declaration a bit more down in this function. This makes the code a bit easier to read as we don't have to worry about this variable yet.

yvrhdn · 2021-09-30T15:55:12Z


 	data, enc, err := job.fn(job.ctx, job.payload)
-	if data != nil {
+	if data != nil || err != nil {


I'm wondering if we still need this if-case? Will there ever be a job that returns no data and no error?

yvrhdn · 2021-09-30T16:03:18Z

 	if req.QueryMode == QueryModeBlocks || req.QueryMode == QueryModeAll {
 		span.LogFields(ot_log.String("msg", "searching store"))
-		partialTraces, dataEncodings, err := q.store.Find(opentracing.ContextWithSpan(ctx, span), userID, req.TraceID, req.BlockStart, req.BlockEnd)
+		partialTraces, dataEncodings, blockErrs, err := q.store.Find(opentracing.ContextWithSpan(ctx, span), userID, req.TraceID, req.BlockStart, req.BlockEnd)


We don't use blockErrs in a meaningful way (we only use the count, not the errors itself). Maybe we should only return an int instead of []error?

yvrhdn · 2021-09-30T16:03:45Z

+		// err contains unrecoverable errors
+		// errs querying blocks are contained in blockErrs


Can you add these comments to Reader.Find itself?

yvrhdn · 2021-09-30T16:11:14Z

-			StatusCode: http.StatusOK,
-			Body:       ioutil.NopCloser(bytes.NewReader(combinedTrace)),
+			StatusCode: statusCode,
+			Body:       ioutil.NopCloser(bytes.NewReader(combinedTraceBytes)),


Suggested change

Body: ioutil.NopCloser(bytes.NewReader(combinedTraceBytes)),

Body: io.NopCloser(bytes.NewReader(combinedTraceBytes)),

ioutil.NopCloser has been moved to io.NopCloser, see https://golang.org/doc/go1.16#ioutil
We just removed all occurrences of ioutil here #998 🙂

yvrhdn · 2021-09-30T16:17:42Z

 	span.SetTag("response marshalling format", util.JSONTypeHeaderValue)
 	marshaller := &jsonpb.Marshaler{}
-	err = marshaller.Marshal(w, resp.Trace)
+	err = marshaller.Marshal(w, resp)


There is a difference between our responses depending on whether the caller accepts protobuf or not: if the caller accepts protobuf we return HTTP 206 and resp.Trace (line 82).

But if the caller requests something else (aka JSON) we marshal the entire TraceByIDResponse. Which would be something like:

{ "trace": ..., "blockErrCount": ... }

Why not also return HTTP 206? This change breaks our API I think.

yvrhdn · 2021-09-30T16:25:40Z

+			var resp tempopb.TraceByIDResponse
+			err = proto.Unmarshal(body, &resp)


The querier only seems to marshal the trace part of TraceByIDResponse (modules/querier/http.go:82):

b, err := proto.Marshal(resp.Trace)

How can we unmarshal a full TraceByIDResponse here?

yvrhdn · 2021-09-30T16:27:41Z

 			body, err := io.ReadAll(rr.Response.Body)
 			rr.Response.Body.Close()
 			if err != nil {
 				return nil, errors.Wrap(err, "error reading response body at query frontend")
 			}


I'm wondering if we should fail the entire request here. If we can't read the body from one of the requests, isn't this also a partial result?
Same for unmarshaling the body.

yvrhdn · 2021-09-30T16:38:19Z

+				if totalBlockErrCount > maxBlockErrCount {
+					return nil, fmt.Errorf("too many block queries failed (max %d)", maxBlockErrCount)
 				}


I'm wondering if it's useful to fail on maxBlockErrCount. We already did the work: all the queriers returned something. Instead of throwing away the results we can still return whatever we got.

mapno · 2021-10-05T11:56:20Z

#1002 created too many conflicts and a lot of changes had to be done anyway, so moved the PR to #1007

mapno added 6 commits September 28, 2021 09:42

Allow for partial results in tempodb pool

cd0f870

Return slice of block errors

Add block err count in proto response

11a7a58

Return blocke errs from Store

2571277

Handle partial results in querier

dd7d40e

Receive new message in query frontend

74a198d

Support partial results in query frontend

005981f

mapno force-pushed the ingester-partial-results branch from 2db45f0 to d8a17a4 Compare September 28, 2021 15:22

Fix data race in pool_test.go

1729d48

mapno force-pushed the ingester-partial-results branch from d8a17a4 to 1729d48 Compare September 28, 2021 15:31

mapno added 2 commits September 28, 2021 18:13

Signal upstream result contains partial results

985022f

tidy-up

1bb3dd8

mapno marked this pull request as ready for review September 30, 2021 13:54

mapno requested review from annanay25, dgzlopes, joe-elliott, mdisibio and yvrhdn as code owners September 30, 2021 13:54

mapno changed the title ~~Ingester partial results~~ WIP: Ingester partial results Sep 30, 2021

mapno changed the title ~~WIP: Ingester partial results~~ WIP: Blocks partial results Sep 30, 2021

yvrhdn reviewed Sep 30, 2021

View reviewed changes

mapno mentioned this pull request Oct 5, 2021

Block partial results #1007

Merged

3 tasks

mapno closed this Oct 5, 2021

mapno deleted the ingester-partial-results branch October 5, 2021 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Blocks partial results#988

WIP: Blocks partial results#988
mapno wants to merge 9 commits intografana:mainfrom
mapno:ingester-partial-results

mapno commented Sep 28, 2021 •

edited

Loading

Uh oh!

yvrhdn left a comment •

edited

Loading

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021 •

edited

Loading

Uh oh!

yvrhdn Sep 30, 2021 •

edited

Loading

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

yvrhdn Sep 30, 2021

Uh oh!

mapno commented Oct 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		// err contains unrecoverable errors
		// errs querying blocks are contained in blockErrs

	Body: ioutil.NopCloser(bytes.NewReader(combinedTraceBytes)),
	Body: io.NopCloser(bytes.NewReader(combinedTraceBytes)),

		var resp tempopb.TraceByIDResponse
		err = proto.Unmarshal(body, &resp)

Conversation

mapno commented Sep 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yvrhdn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yvrhdn Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yvrhdn Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mapno commented Oct 5, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mapno commented Sep 28, 2021 •

edited

Loading

yvrhdn left a comment •

edited

Loading

yvrhdn Sep 30, 2021 •

edited

Loading

yvrhdn Sep 30, 2021 •

edited

Loading