Skip to content

docs: Add Operations Guide (Prometheus Metrics & Troubleshooting) #2048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fromsaurav
Copy link

This PR introduces the Markdown files for a new "Operations Guide" under the docs/operations/ directory. This guide is intended to help users and operators effectively monitor, troubleshoot, and maintain OpenKruise installations.

Key Documentation Files Added:

  • docs/operations/README.md: An introductory page for the operations guide.
  • docs/operations/prometheus-metrics.md:
    • Details on enabling and accessing OpenKruise metrics.
    • Categorization of key metrics.
    • Guidance on setting up Prometheus ServiceMonitor and example alerting rules.
    • A compact, example Grafana dashboard JSON.
  • docs/operations/troubleshooting.md:
    • General troubleshooting methodology.
    • Specific troubleshooting steps for controllers, webhooks, and various Kruise workloads.
    • Common error messages and diagnostic commands.

Purpose:

These files provide the core content for an operational guide. The navigation for these documents on the openkruise.io website will be added via a separate Pull Request to the openkruise/openkruise.io repository.

Related Issue (Clarification):

This PR provides operational documentation. It does not directly address or fix feature request #1672 ("Support for configmaps grayscale publishing"), which is a feature implementation task. A pr from openkruise.io will be linked to this pr which would be a feature implementing pr.

How to Review:

  • Check the content of the new Markdown files in docs/operations/ for clarity, accuracy, and completeness.
  • Verify the correctness of commands and YAML examples within the documentation.

Looking forward to feedback!

@kruise-bot kruise-bot requested review from Fei-Guo and zmberg May 19, 2025 19:41
@kruise-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign fei-guo for approval by writing /assign @fei-guo in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kruise-bot kruise-bot added the size/XL size/XL: 500-999 label May 19, 2025
This commit introduces the markdown files for a new 'Operations Guide'
section. It includes:
- An overview README for the Operations Guide.
- A detailed guide on Prometheus metrics exposed by OpenKruise,
  including setup, alerting, and Grafana examples.
- A comprehensive troubleshooting guide for common OpenKruise issues,
  covering controllers, webhooks, and various workloads.

This content will be linked from the openkruise.io website sidebar
via a separate PR to the openkruise/openkruise.io repository.

Signed-off-by: Saurav Teli <[email protected]>
@fromsaurav fromsaurav force-pushed the feat/add-operations-guide-docs branch from 04b95ad to 0998e84 Compare May 19, 2025 19:44
Copy link

codecov bot commented May 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.76%. Comparing base (ff8dcec) to head (0998e84).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2048      +/-   ##
==========================================
- Coverage   43.78%   43.76%   -0.02%     
==========================================
  Files         316      316              
  Lines       31617    31617              
==========================================
- Hits        13842    13838       -4     
  Misses      16378    16378              
- Partials     1397     1401       +4     
Flag Coverage Δ
unittests 43.76% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fromsaurav
Copy link
Author

fromsaurav commented May 19, 2025

hey @zmberg @Fei-Guo , In the '/openkruise.io` added "Operations Guide" to sidebars.js and docs/. The category isn't showing in the local sidebar preview despite a successful build and cache clearing; I'd appreciate a review of the sidebars.js integration to link that upcoming pr with this one.

@furykerry
Copy link
Member

the operations guide should be put in the document repository

@fromsaurav
Copy link
Author

the operations guide should be put in the document repository

okay, i'll update this ! Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL size/XL: 500-999
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants