Skip to content

tempo-query: add ReadinessProbe#1061

Merged
frzifus merged 1 commit intografana:mainfrom
frzifus:tempo-query/ReadinessProbe
Oct 16, 2024
Merged

tempo-query: add ReadinessProbe#1061
frzifus merged 1 commit intografana:mainfrom
frzifus:tempo-query/ReadinessProbe

Conversation

@frzifus
Copy link
Copy Markdown
Collaborator

@frzifus frzifus commented Oct 15, 2024

@frzifus frzifus force-pushed the tempo-query/ReadinessProbe branch from a0bca33 to 29ddb12 Compare October 15, 2024 07:39
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this enhancement or bug_fix?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my point of view its a pure enhancement. Everything works as before. We only support the kubernetes readiness check.

component: tempostack

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add ReadinessProbe to tempo-query
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added? What are the consequences of this patch?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of showing that the pod is ready and handling the failure internally, the k8s api will show that its not ready. K8s will do regular calls to the gRPC endpoint to verify that its alive.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and we should document #1061 (comment) in the changelog. Impact of adding readiness probe to our deployment.

I would also say that this is more a bug_fix than enhancement.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. improves reliability at startup, avoiding lost data

@frzifus
Copy link
Copy Markdown
Collaborator Author

frzifus commented Oct 15, 2024

The kubelet uses readiness probes to know when a container is ready to start accepting traffic. A Pod is considered ready when all of its containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

cc @pavolloffay

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

@frzifus frzifus requested a review from pavolloffay October 15, 2024 08:41
@pavolloffay
Copy link
Copy Markdown
Collaborator

cc @pavolloffay

https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

Yes, but does it fixes any particular issues in our deployments?

@frzifus
Copy link
Copy Markdown
Collaborator Author

frzifus commented Oct 15, 2024

y, during my multi tenancy debug session I noticed that we sometimes lost some measurements. The test did check for 10 reports but we only had 7-8. Adding a delay solved it. But having the readiness probe made it more resilient.

While typing, I think it would actually be a good Idea to add this kind of check to Jaeger-Query too. There is a extra health endpoint on localhost:14269/ for it.

Server Response

{"status":"Server available","upSince":"2024-10-15T08:44:13.514559835Z","uptime":"12.13965331s"}

@frzifus frzifus force-pushed the tempo-query/ReadinessProbe branch from 29ddb12 to 5d6ecd6 Compare October 15, 2024 14:58
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Oct 15, 2024

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 69.18%. Comparing base (21e91ab) to head (fc77c2b).
⚠️ Report is 213 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1061      +/-   ##
==========================================
+ Coverage   69.14%   69.18%   +0.04%     
==========================================
  Files         110      110              
  Lines        7059     7069      +10     
==========================================
+ Hits         4881     4891      +10     
  Misses       1888     1888              
  Partials      290      290              
Flag Coverage Δ
unittests 69.18% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: Without a readiness check in place, there is a risk that data will be lost when the queryfrontend pod is ready but the tempo query API is not yet available.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can be data lost in query frontend? It does not ingest any data? It's used only for querying.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The queryfrontend with enabled jaeger contains 3 containers in its pod. tempo-query is part of it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that, the point I am making is that the

data will be lost when the queryfrontend pod

implies to me that this fixes data ingestion, which is not the case

Signed-off-by: Benedikt Bongartz <bongartz@klimlive.de>
@frzifus frzifus force-pushed the tempo-query/ReadinessProbe branch from 5d6ecd6 to fc77c2b Compare October 16, 2024 11:23
@frzifus frzifus enabled auto-merge (squash) October 16, 2024 11:24
@frzifus frzifus merged commit 7dcf94c into grafana:main Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ReadinessProbe for tempo-query

4 participants