The homelab Loki stack that monitors my production alerts

My homelab runs a full Grafana, Loki, and Tempo stack on a k3s cluster. It aggregates logs and traces from my personal projects, this portfolio site, and a filtered feed of FinanceOps production alerts. Running a personal observability stack sounds like overkill for a side project. It is not. It is the single best investment I have made in my own engineering education.

Generate a realistic homelab editorial photo: compact rack, mini PCs, blinking LEDs, labeled cables, dark room, teal and tungsten lighting, 16:9, no people, no pristine showroom look.

The kind of infrastructure that teaches you by breaking.

Leadership got more concrete for me once I realized release engineering and infrastructure are really trust systems. It also builds on what I learned earlier in “When your SLOs and your sales team disagree, the SLOs lose.” The infrastructure stack, ctrlpane, and even my dotfiles all orbit the same idea now: the best teams move fast because the defaults are stable, not because the heroics are impressive.

Editorial supporting image for the section "The Architecture" in the article "The homelab Loki stack that monitors my production alerts". Show mini PC rack, patch cables, terminal windows, Grafana-style dashboard glow, labeled hardware with signs of real use for "The homelab Loki stack that monitors my production alerts". Focus on one operational artifact that makes the post feel lived-in rather than conceptual. Color palette: teal, zinc, dim server LEDs, warm tungsten accents. Mood: measured, confident, strategic, scarred enough to sound calm while saying hard things. Composition: 16:9 landscape image, documentary/editorial feel, no text overlays, no stock-photo polish. Avoid: No spotless enterprise data-center stock photos, no blue LED overkill, no clip-art servers, no text overlays.

The infrastructure mess that made the lesson stick.

The Architecture

The stack runs on a single mini PC with 32GB RAM and a 1TB NVMe drive. K3s manages the workloads. The storage is local because I do not need durability for personal observability data. If the drive fails, I lose dashboards and historical data. That is an acceptable tradeoff for the cost savings.

Grafana: dashboards and alerting. The same Grafana instance serves dashboards for portfolio site metrics, homelab cluster health, and FinanceOps production alert summaries.
Loki: log aggregation. Logs from all personal project deployments are shipped to Loki via Promtail agents running on each service.
Tempo: distributed tracing. Traces from the portfolio site, chat system, and homelab services are collected via OpenTelemetry and stored in Tempo.
Cloudflare Tunnels: secure ingestion. No ports are open on my home network. All inbound log and trace data arrives through Cloudflare Tunnels, which provides TLS termination, access control, and DDoS protection for free.

FinanceOps Production Alert Feed

The most unusual part of the setup is the FinanceOps production alert feed. I do not copy production logs to my homelab. That would be a security and compliance violation. Instead, I forward a structured summary of production alerts: severity, service name, alert type, and resolution time. No customer data. No log content. Just operational metadata.

This alert feed goes to a Grafana dashboard that tracks FinanceOps operational health trends alongside my personal project metrics. I can see, at a glance, whether production incident frequency is trending up or down, which services are generating the most alerts, and how resolution times are changing over time.

The value is perspective. When I am in the office handling incidents, I see individual events. The homelab dashboard shows me the trend line. An individual incident is an event. A trend line is a signal. The homelab gives me the distance to see signals that are invisible when you are inside the system.

Personal Project Observability

Running observability on personal projects taught me more about monitoring than two years of managing production observability at work. The reason is simple: on personal projects, I am the only engineer, the only SRE, and the only on-call. Every observability gap is my problem.

The portfolio site ships OpenTelemetry traces for every page render, API call, and CMS operation. I can see exactly how long each component takes to render on the server.
The chat system logs every message exchange, rate limit decision, and AI response generation. When the chatbot gives a bad answer, I can trace the exact retrieval pipeline that produced it.
The k3s cluster itself is instrumented with node-exporter and kube-state-metrics. Memory pressure, disk usage, and pod restart patterns are all visible.

When I set up observability at FinanceOps, I am drawing on patterns I developed and tested on the homelab first. The homelab is the laboratory. Production is the deployment.

Why Personal Observability Makes You a Better Leader

Engineering leaders who do not operate systems lose touch with operational reality. They make decisions about reliability, monitoring, and incident response without feeling the pain of a 3 AM alert or the frustration of a dashboard that does not show what you need during an outage.

Running a personal observability stack keeps me hands-on with the tools my team uses daily. I know Grafana’s query language because I write queries for my own dashboards. I know Loki’s limitations because I hit them on my own log volume. I know Tempo’s trace correlation capabilities because I use them to debug my own services.

This hands-on experience changes the quality of my leadership conversations. When an engineer says “our Loki query performance is degrading,” I can have a technical conversation about label cardinality and chunk_target_size because I have tuned those settings on my own cluster. That technical credibility matters. Engineers trust leaders who understand their tools, not leaders who manage from dashboards they have never built.

The Cost

Create a realistic infrastructure editorial image with a homelab rack and nearby monitoring screen, subtle Grafana-like charts, dark graphite and green palette, 4:3, no branded UI, no stock-photo polish.

Homelab, but treated like a real environment.

By the time I wrote this, the lesson was bigger than the tool or incident. The job had become setting defaults a team could trust, then proving those defaults in systems like infrastructure and ctrlpane. That is leadership work, not just technical taste.

The total cost of the homelab observability stack is electricity, a $400 mini PC, and occasional Saturday mornings updating k3s. The return is operational intuition that no management course can teach.

Every engineering leader should run infrastructure. Not production infrastructure. Personal infrastructure. The kind where the consequences of failure are a broken personal project, not a customer outage. The kind where you can experiment freely, break things safely, and build the operational muscle that makes you credible when you lead teams that operate at scale.

The Loki stack in the homelab monitors production alerts because it runs the same Grafana dashboards against production metrics. The homelab is not a separate monitoring system — it is a redundant observation point that continues working when the primary monitoring infrastructure has issues. The cost of running Loki on homelab hardware is negligible compared to the confidence it provides during incidents where the cloud-hosted monitoring stack is itself affected.