This repository contains a collection of Prometheus alert templates for various systems and services. Each directory contains specific alert rules and documentation for monitoring different components of your infrastructure.
- Kubernetes: Alerts for Kubernetes cluster monitoring
- Redis: Alerts for Redis monitoring
- PostgreSQL: Alerts for PostgreSQL database monitoring
- MySQL: Alerts for MySQL database monitoring
- MongoDB: Alerts for MongoDB monitoring
- Supabase: Alerts for Supabase platform monitoring
Update your existing Helm values:
# values.yaml
additionalPrometheusRulesMap:
custom-alerts.yaml:
groups:
- name: custom_alerts
rules:
# Copy rules from respective alert template files
- alert: ExampleAlert
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Example alert"
description: "Example description"
Apply the update:
helm upgrade prometheus-stack prometheus-community/kube-prometheus-stack \
-f values.yaml \
-n monitoring
Create a separate PrometheusRule resource:
# prometheus-rule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: custom-alerts
namespace: monitoring
labels:
release: prometheus # Match your Prometheus Operator release label
spec:
groups:
- name: custom_alerts
rules:
# Copy rules from respective alert template files
Apply the rule:
kubectl apply -f prometheus-rule.yaml
# docker-compose.yml
services:
prometheus:
image: prom/prometheus:latest
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- ./prometheus/alerts:/etc/prometheus/alerts # Mount alerts directory
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
ports:
- "9090:9090"
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alerts/*.yml" # Include all alert files
scrape_configs:
# Your scrape configs here
# Create alerts directory
mkdir -p ./prometheus/alerts
# Copy alert rules
cp <integration>/*-alerts.yml ./prometheus/alerts/
# Restart Prometheus
docker-compose restart prometheus
# values.yaml
alertmanager:
enabled: true
configMapOverrideName: ""
config:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'severity']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'default-receiver'
receivers:
- name: 'default-receiver'
# Configure your notification channels
server:
global:
scrape_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- "prometheus-alertmanager.monitoring.svc.cluster.local:9093"
serverFiles:
alerts.yml:
groups:
- name: custom_alerts
rules:
# Copy rules from respective alert template files
# First time installation
helm install prometheus prometheus-community/prometheus \
-f values.yaml \
-n monitoring \
--create-namespace
# Upgrading existing installation
helm upgrade prometheus prometheus-community/prometheus \
-f values.yaml \
-n monitoring
mkdir -p /etc/prometheus/rules
# Copy alert rules to rules directory
cp <integration>/*-alerts.yml /etc/prometheus/rules/
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "rules/*.yml"
scrape_configs:
# Your scrape configs here
curl -X POST http://localhost:9090/-/reload
- Navigate to Grafana Cloud Portal
- Go to Prometheus → Rules
- Click "New Rule"
- Copy the alert rules from the respective YAML files
- Set appropriate labels and annotations
- Save and activate the rules
alertmanager:
config:
global:
resolve_timeout: 5m
route:
group_by: ['alertname', 'job']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'default'
receivers:
- name: 'default'
# Configure your notification channels
alertmanager:
config:
global:
resolve_timeout: 5m
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
receiver: 'default'
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
routes:
- receiver: 'critical'
match:
severity: critical
group_wait: 10s
repeat_interval: 1h
- receiver: 'warnings'
match:
severity: warning
receivers:
- name: 'default'
slack_configs:
- channel: '#alerts'
title: '{{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
- name: 'critical'
slack_configs:
- channel: '#critical-alerts'
title: 'CRITICAL: {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
- name: 'warnings'
slack_configs:
- channel: '#warnings'
title: 'Warning: {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
-
Rule Organization
- Keep rules organized by component
- Use consistent naming conventions
- Group related alerts together
-
Alert Severity Levels
- critical: Immediate action required
- warning: Investigation needed
- info: For informational purposes
-
Labels and Annotations
- Use consistent labels across alerts
- Include detailed descriptions
- Add runbook links when available
-
Timing Parameters
- Set appropriate 'for' duration
- Configure group_wait and group_interval
- Adjust repeat_interval based on urgency
- Using promtool:
promtool check rules /path/to/rules/*.yml
- Using Prometheus API:
curl http://localhost:9090/api/v1/rules
- Using Prometheus UI:
- Navigate to Status → Rules
- Check rule evaluation status
-
Regular Review
- Review alert thresholds periodically
- Update rules based on false positives/negatives
- Keep documentation current
-
Version Control
- Track changes in version control
- Document significant updates
- Test changes before deployment
-
Backup
- Backup rule configurations
- Document custom modifications
- Maintain change history
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.