intermediate ~90 min updated 2026-06-01

Prometheus and Grafana Monitoring

Install the kube-prometheus-stack Helm chart on kind, explore cluster metrics with PromQL, open prebuilt Grafana dashboards, and create a custom alert rule with Alertmanager.

Objective

Stand up Prometheus, Grafana, and Alertmanager on a local cluster and build one PromQL query, one dashboard panel, and one firing alert. The kube-prometheus-stack chart gives you a production-shaped monitoring stack in a single install.

Prerequisites

Docker installed and running
kind and kubectl installed
Helm v3 installed
About 6 GB of free RAM — this stack runs many components

Architecture

The prometheus-community kube-prometheus-stack chart installs the Prometheus Operator, a Prometheus server, Alertmanager, Grafana, node-exporter (per node), and kube-state-metrics. Prometheus scrapes metrics endpoints discovered through ServiceMonitor CRDs; Grafana queries Prometheus as a datasource; alert rules evaluated by Prometheus fire into Alertmanager.

+---------------------------- kind cluster ----------------------------+
| node-exporter --+                                                    |
| kube-state-    -+--> Prometheus <-- PrometheusRule (alerts)          |
|   metrics       |        |   \                                       |
| kubelet/cAdvisor+        |    +--> Alertmanager --> (notifications)  |
|                          v                                           |
|                       Grafana (dashboards via datasource)            |
+----------------------------------------------------------------------+

Steps

1. Create the cluster

kind create cluster --name mon-lab

2. Install kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring --create-namespace \
  --set grafana.adminPassword=labpass123 \
  --wait --timeout 10m
kubectl get pods -n monitoring

3. Explore metrics in Prometheus

kubectl port-forward svc/monitoring-kube-prometheus-prometheus \
  -n monitoring 9090:9090 &

Open http://localhost:9090 and run these PromQL queries in the Graph tab:

# Number of running pods per namespace
sum by (namespace) (kube_pod_status_phase{phase="Running"})

# CPU usage per node (cores)
sum by (instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m]))

# Container memory working set, top 5
topk(5, container_memory_working_set_bytes{container!=""})

4. Open Grafana dashboards

kubectl port-forward svc/monitoring-grafana -n monitoring 3000:80 &

Open http://localhost:3000, log in with admin / labpass123, then browse Dashboards → “Kubernetes / Compute Resources / Cluster”. Create a new panel using this query to chart pod restarts:

sum by (namespace) (increase(kube_pod_container_status_restarts_total[1h]))

5. Create a custom alert rule

# crashloop-alert.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: lab-crashloop
  namespace: monitoring
  labels:
    release: monitoring
spec:
  groups:
    - name: lab.rules
      rules:
        - alert: PodCrashLooping
          expr: increase(kube_pod_container_status_restarts_total[5m]) > 2
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"

kubectl apply -f crashloop-alert.yaml

6. Trigger the alert with a crashing pod

kubectl run crasher --image=busybox --restart=Always -- sh -c "exit 1"
# Wait ~3-5 minutes, then check the Alerts page in Prometheus (http://localhost:9090/alerts)
kubectl port-forward svc/monitoring-kube-prometheus-alertmanager \
  -n monitoring 9093:9093 &
# Alertmanager UI: http://localhost:9093

Expected output

$ kubectl get pods -n monitoring
NAME                                                     READY   STATUS    AGE
alertmanager-monitoring-kube-prometheus-alertmanager-0   2/2     Running   3m
monitoring-grafana-6c7d9f4b8-xkj2p                       3/3     Running   3m
monitoring-kube-prometheus-operator-78d6c87f9-q8wzt      1/1     Running   3m
monitoring-kube-state-metrics-65bf7c8d77-tn4rk           1/1     Running   3m
monitoring-prometheus-node-exporter-7h2mq                1/1     Running   3m
prometheus-monitoring-kube-prometheus-prometheus-0       2/2     Running   3m

Prometheus /alerts page after ~5 minutes:
PodCrashLooping (1 active)
  Labels: namespace=default, pod=crasher, severity=warning
  State: FIRING

Troubleshooting

Helm install times out: the kind node is short on memory. Free RAM or extend the timeout: --timeout 15m. Check progress with kubectl get pods -n monitoring -w.
Custom PrometheusRule never appears in Prometheus: the rule is missing the release: monitoring label, which the operator’s rule selector requires. Verify with kubectl get prometheusrule -n monitoring --show-labels.
Grafana shows “No data” on dashboards: Prometheus is still starting or the datasource is unhealthy. In Grafana go to Connections → Data sources → Prometheus → Test.
Port-forward dies repeatedly: the target pod restarted. Re-run the port-forward command; for stability forward to the pod’s Service rather than the pod.
kube_pod_container_status_restarts_total returns nothing: kube-state-metrics is not Running yet. kubectl get pods -n monitoring | grep kube-state-metrics and wait for Ready 1/1.

Cleanup

kubectl delete pod crasher --ignore-not-found
kubectl delete -f crashloop-alert.yaml
helm uninstall monitoring -n monitoring
kubectl delete namespace monitoring
kill %1 %2 %3 2>/dev/null || true
kind delete cluster --name mon-lab
rm -f crashloop-alert.yaml