tempo-query: add ReadinessProbe#1061
Conversation
a0bca33 to
29ddb12
Compare
| @@ -0,0 +1,16 @@ | |||
| # One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix' | |||
| change_type: enhancement | |||
There was a problem hiding this comment.
is this enhancement or bug_fix?
There was a problem hiding this comment.
From my point of view its a pure enhancement. Everything works as before. We only support the kubernetes readiness check.
| component: tempostack | ||
|
|
||
| # A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`). | ||
| note: Add ReadinessProbe to tempo-query |
There was a problem hiding this comment.
Why is this being added? What are the consequences of this patch?
There was a problem hiding this comment.
Instead of showing that the pod is ready and handling the failure internally, the k8s api will show that its not ready. K8s will do regular calls to the gRPC endpoint to verify that its alive.
There was a problem hiding this comment.
Yes, and we should document #1061 (comment) in the changelog. Impact of adding readiness probe to our deployment.
I would also say that this is more a bug_fix than enhancement.
There was a problem hiding this comment.
e.g. improves reliability at startup, avoiding lost data
cc @pavolloffay |
Yes, but does it fixes any particular issues in our deployments? |
|
y, during my multi tenancy debug session I noticed that we sometimes lost some measurements. The test did check for 10 reports but we only had 7-8. Adding a delay solved it. But having the readiness probe made it more resilient. While typing, I think it would actually be a good Idea to add this kind of check to Jaeger-Query too. There is a extra health endpoint on Server Response {"status":"Server available","upSince":"2024-10-15T08:44:13.514559835Z","uptime":"12.13965331s"} |
29ddb12 to
5d6ecd6
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1061 +/- ##
==========================================
+ Coverage 69.14% 69.18% +0.04%
==========================================
Files 110 110
Lines 7059 7069 +10
==========================================
+ Hits 4881 4891 +10
Misses 1888 1888
Partials 290 290
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| # (Optional) One or more lines of additional information to render under the primary note. | ||
| # These lines will be padded with 2 spaces and then inserted directly into the document. | ||
| # Use pipe (|) for multiline entries. | ||
| subtext: Without a readiness check in place, there is a risk that data will be lost when the queryfrontend pod is ready but the tempo query API is not yet available. |
There was a problem hiding this comment.
how can be data lost in query frontend? It does not ingest any data? It's used only for querying.
There was a problem hiding this comment.
The queryfrontend with enabled jaeger contains 3 containers in its pod. tempo-query is part of it.
There was a problem hiding this comment.
I understand that, the point I am making is that the
data will be lost when the queryfrontend pod
implies to me that this fixes data ingestion, which is not the case
Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
5d6ecd6 to
fc77c2b
Compare
Closes #1058
Requieres: