Skip to content

Monitoring & Observability

The observability stack provides metrics, dashboards, and centralized logging for the entire cluster.

Stack components

ComponentRoleHelm chart
kube-prometheus-stackPrometheus + Grafana + Alertmanager + exportersprometheus-community/kube-prometheus-stack
LokiCentralized log storagegrafana/loki
PromtailLog collector (DaemonSet)grafana/promtail

Architecture

┌──────────────────────────────────────────────────────────┐
│                   monitoring namespace                    │
│                                                          │
│  ┌──────────────┐    ┌───────────┐    ┌──────────────┐  │
│  │  Prometheus  │◄───│ Exporters │    │    Loki      │  │
│  │  (metrics)   │    │ (node,    │    │  (log store) │  │
│  └──────┬───────┘    │  cadvisor)│    └──────▲───────┘  │
│         │            └───────────┘           │          │
│  ┌──────▼───────┐                    ┌───────┴──────┐   │
│  │   Grafana    │◄───────────────────│   Promtail   │   │
│  │  (dashboards)│                    │  (DaemonSet) │   │
│  └──────────────┘                    └──────────────┘   │
│                                                          │
└──────────────────────────────────────────────────────────┘
         ▲ HTTPS via Traefik IngressRoute + cert-manager

Prerequisites

Before deploying monitoring:

  1. Base stack deployed (make deploy)
  2. Grafana admin secret created:
bash
make deploy-grafana-secret

This creates a grafana-admin-secret in the monitoring namespace with:

  • username: admin
  • password: <GRAFANA_PASSWORD from .env>

Deploy

bash
make deploy-monitoring

This runs scripts/deploy-monitoring.sh which:

  1. Adds prometheus-community and grafana Helm repos
  2. Installs kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
  3. Installs Loki (single-binary, filesystem storage)
  4. Installs Promtail (log collector DaemonSet on every node)
  5. Applies the Grafana IngressRoute + TLS Certificate
  6. Imports the Grafana logs dashboard

kube-prometheus-stack

The kube-prometheus-stack Helm chart installs:

  • Prometheus — metrics collection and storage
  • Grafana — visualization dashboards
  • Alertmanager — alert routing and silencing
  • kube-state-metrics — Kubernetes object metrics
  • node-exporter — host-level metrics (CPU, memory, disk)
  • Prometheus Operator — manages ServiceMonitor and PrometheusRule CRDs

Grafana access

URL:      https://<GRAFANA_DOMAIN>
Username: admin
Password: <GRAFANA_PASSWORD>

Prometheus access (port-forward)

bash
kubectl port-forward svc/prometheus-operated -n monitoring 9090:9090
# Open: http://localhost:9090

Alertmanager access (port-forward)

bash
kubectl port-forward svc/alertmanager-operated -n monitoring 9093:9093
# Open: http://localhost:9093

Grafana OAuth2 / SSO

Grafana supports OAuth2 login without any provider-specific configuration in k3s-lab. All provider settings are injected at runtime via a Kubernetes Secret.

How it works

The kube-prometheus-values.yaml mounts grafana-oauth-secret as an optional secret:

  • Secret absent → Grafana starts normally with admin/password login
  • Secret present → Grafana reads all GF_AUTH_GENERIC_OAUTH_* env vars from it

This means you can configure any OIDC-compatible provider (Infomaniak, Auth0, Keycloak, Entra ID, etc.) simply by creating the secret with the right values.

Enable OAuth2

  1. Create the grafana-oauth-secret with your provider's settings (see configuration.md):

    bash
    kubectl create secret generic grafana-oauth-secret \
      --from-literal=GF_AUTH_GENERIC_OAUTH_ENABLED="true" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_NAME="<Provider Name>" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_CLIENT_ID="<client-id>" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET="<client-secret>" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_AUTH_URL="<authorize-url>" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_TOKEN_URL="<token-url>" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_API_URL="<userinfo-url>" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_SCOPES="openid email profile" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_USE_PKCE="true" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_USE_REFRESH_TOKEN="true" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_AUTO_LOGIN="true" \
      --from-literal=GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP="true" \
      --from-literal=GF_AUTH_DISABLE_LOGIN_FORM="true" \
      --namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
  2. Restart Grafana:

    bash
    kubectl rollout restart deployment/kube-prometheus-stack-grafana -n monitoring

Disable OAuth2

Delete the secret and restart Grafana to revert to admin/password login:

bash
kubectl delete secret grafana-oauth-secret -n monitoring
kubectl rollout restart deployment/kube-prometheus-stack-grafana -n monitoring

Loki

Loki stores logs indexed by labels (no full-text indexing). It is queried from Grafana using LogQL.

Configuration (kubernetes/monitoring/loki-values.yaml): deployed in single-binary mode with filesystem storage — suitable for single-node homelab use.

Query logs in Grafana

  1. Go to Explore → select Loki datasource
  2. Use a LogQL query:
logql
{namespace="apps"}
{namespace="ingress", job="traefik"} |= "error"
{app="my-app"} | json | level="error"

Loki service endpoint (in-cluster)

http://loki.monitoring.svc.cluster.local:3100

Promtail

Promtail is deployed as a DaemonSet — one pod per node. It:

  1. Reads container logs from /var/log/pods/
  2. Attaches Kubernetes labels (namespace, pod, container, app)
  3. Pushes log streams to Loki

Configuration (kubernetes/monitoring/promtail-values.yaml): uses the default pipeline stages to extract structured labels from Kubernetes metadata.


Grafana dashboards

The following dashboards are available after deploy:

DashboardSourceWhat it shows
Kubernetes cluster overviewkube-prometheus built-inNode CPU/memory, pod counts
Node exporterkube-prometheus built-inHost CPU, memory, disk, network
TraefikServiceMonitor auto-discoveryRequest rates, latencies, errors
Logs — Errorsgrafana-logs-dashboard.yamlError-focused log explorer

Import additional dashboards

Grafana has a large community dashboard library. Import by ID from Dashboards → Import:

IDName
315Kubernetes cluster monitoring
1860Node exporter full
13713Loki log summary
17501Traefik

Traefik metrics integration

Traefik exposes Prometheus metrics on port 9100. The serviceMonitor in traefik-values.yaml creates a ServiceMonitor resource that tells Prometheus Operator to scrape Traefik automatically:

yaml
metrics:
  prometheus:
    serviceMonitor:
      enabled: true
      namespace: ingress
      jobLabel: traefik
      interval: 30s

Upgrade

bash
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack \
  --version <NEW_VERSION> \
  --namespace monitoring \
  --values kubernetes/monitoring/kube-prometheus-values.yaml \
  --reuse-values

Update the version in .env (KUBE_PROMETHEUS_VERSION) and re-run make deploy-monitoring.


References