intermediate ~90 min updated 2026-06-01
Prometheus and Grafana Monitoring
Install the kube-prometheus-stack Helm chart on kind, explore cluster metrics with PromQL, open prebuilt Grafana dashboards, and create a custom alert rule with Alertmanager.
Objective
Stand up Prometheus, Grafana, and Alertmanager on a local cluster and build one PromQL query, one dashboard panel, and one firing alert. The kube-prometheus-stack chart gives you a production-shaped monitoring stack in a single install.
Prerequisites
- Docker installed and running
- kind and kubectl installed
- Helm v3 installed
- About 6 GB of free RAM — this stack runs many components
Architecture
The prometheus-community kube-prometheus-stack chart installs the Prometheus Operator, a Prometheus server, Alertmanager, Grafana, node-exporter (per node), and kube-state-metrics. Prometheus scrapes metrics endpoints discovered through ServiceMonitor CRDs; Grafana queries Prometheus as a datasource; alert rules evaluated by Prometheus fire into Alertmanager.
+---------------------------- kind cluster ----------------------------+
| node-exporter --+ |
| kube-state- -+--> Prometheus <-- PrometheusRule (alerts) |
| metrics | | \ |
| kubelet/cAdvisor+ | +--> Alertmanager --> (notifications) |
| v |
| Grafana (dashboards via datasource) |
+----------------------------------------------------------------------+
Steps
1. Create the cluster
kind create cluster --name mon-lab
2. Install kube-prometheus-stack
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace \
--set grafana.adminPassword=labpass123 \
--wait --timeout 10m
kubectl get pods -n monitoring
3. Explore metrics in Prometheus
kubectl port-forward svc/monitoring-kube-prometheus-prometheus \
-n monitoring 9090:9090 &
Open http://localhost:9090 and run these PromQL queries in the Graph tab:
# Number of running pods per namespace
sum by (namespace) (kube_pod_status_phase{phase="Running"})
# CPU usage per node (cores)
sum by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))
# Container memory working set, top 5
topk(5, container_memory_working_set_bytes{container!=""})
4. Open Grafana dashboards
kubectl port-forward svc/monitoring-grafana -n monitoring 3000:80 &
Open http://localhost:3000, log in with admin / labpass123, then browse Dashboards → “Kubernetes / Compute Resources / Cluster”. Create a new panel using this query to chart pod restarts:
sum by (namespace) (increase(kube_pod_container_status_restarts_total[1h]))
5. Create a custom alert rule
# crashloop-alert.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: lab-crashloop
namespace: monitoring
labels:
release: monitoring
spec:
groups:
- name: lab.rules
rules:
- alert: PodCrashLooping
expr: increase(kube_pod_container_status_restarts_total[5m]) > 2
for: 1m
labels:
severity: warning
annotations:
summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
kubectl apply -f crashloop-alert.yaml
6. Trigger the alert with a crashing pod
kubectl run crasher --image=busybox --restart=Always -- sh -c "exit 1"
# Wait ~3-5 minutes, then check the Alerts page in Prometheus (http://localhost:9090/alerts)
kubectl port-forward svc/monitoring-kube-prometheus-alertmanager \
-n monitoring 9093:9093 &
# Alertmanager UI: http://localhost:9093
Expected output
$ kubectl get pods -n monitoring
NAME READY STATUS AGE
alertmanager-monitoring-kube-prometheus-alertmanager-0 2/2 Running 3m
monitoring-grafana-6c7d9f4b8-xkj2p 3/3 Running 3m
monitoring-kube-prometheus-operator-78d6c87f9-q8wzt 1/1 Running 3m
monitoring-kube-state-metrics-65bf7c8d77-tn4rk 1/1 Running 3m
monitoring-prometheus-node-exporter-7h2mq 1/1 Running 3m
prometheus-monitoring-kube-prometheus-prometheus-0 2/2 Running 3m
Prometheus /alerts page after ~5 minutes:
PodCrashLooping (1 active)
Labels: namespace=default, pod=crasher, severity=warning
State: FIRING
Troubleshooting
- Helm install times out: the kind node is short on memory. Free RAM or extend the timeout:
--timeout 15m. Check progress withkubectl get pods -n monitoring -w. - Custom PrometheusRule never appears in Prometheus: the rule is missing the
release: monitoringlabel, which the operator’s rule selector requires. Verify withkubectl get prometheusrule -n monitoring --show-labels. - Grafana shows “No data” on dashboards: Prometheus is still starting or the datasource is unhealthy. In Grafana go to Connections → Data sources → Prometheus → Test.
- Port-forward dies repeatedly: the target pod restarted. Re-run the port-forward command; for stability forward to the pod’s Service rather than the pod.
kube_pod_container_status_restarts_totalreturns nothing: kube-state-metrics is not Running yet.kubectl get pods -n monitoring | grep kube-state-metricsand wait for Ready 1/1.
Cleanup
kubectl delete pod crasher --ignore-not-found
kubectl delete -f crashloop-alert.yaml
helm uninstall monitoring -n monitoring
kubectl delete namespace monitoring
kill %1 %2 %3 2>/dev/null || true
kind delete cluster --name mon-lab
rm -f crashloop-alert.yaml