/observability
Posts touching observability.
9 posts
- January 12, 2026 5 min
The homelab Loki stack that monitors my production alerts
A Grafana/Loki/Tempo stack on homelab k3s aggregates logs from personal projects and FinanceOps production alerting. A personal observability stack makes you a better leader.
- /homelab
- /observability
- /self-hosting
- /cloudflare
- /blog
- November 24, 2025 4 min
When your SLOs and your sales team disagree, the SLOs lose
An enterprise prospect required 99.99% uptime. Our SLOs were 99.9%. Engineering leaders who refuse to engage with commercial reality get bypassed.
- /engineering-leadership
- /cross-functional
- /financeops
- /observability
- /blog
- August 25, 2025 4 min
How I use Grafana dashboards to run engineering meetings instead of slide decks
Replacing weekly status slides with live Grafana dashboards eliminated performative reporting and forced teams to instrument what matters.
- /observability
- /engineering-leadership
- /cross-functional
- /blog
- July 28, 2025 4 min
My homelab staging cluster caught a production bug before CI did
A memory leak only manifested under sustained 48-hour load. CI tests pass in seconds. The homelab k3s cluster caught what automated tests could not.
- /homelab
- /k3s
- /observability
- /ci-cd
- /experience
- June 19, 2025 5 min
The on-call rotation that was just me, and why I finally admitted that was not sustainable
For 14 months I was the only person who got paged at 3 AM. The real reason was not team size. It was that I did not trust anyone else to handle production incidents.
- /engineering-leadership
- /team-building
- /observability
- /startup-life
- /experience
- May 5, 2025 4 min
Grafana, Loki, and Tempo: building an observability stack that a four-person team actually uses
Most observability guides assume a platform team. We do not have one. The hard part was not installation but building dashboards engineers actually check daily.
- /observability
- /kubernetes
- /architecture
- /self-hosting
- /blog
- March 24, 2025 4 min
Our Kafka consumer lag crisis and why I stopped trusting "it works on my machine" for event-driven systems
Consumer lag grew silently for two weeks because local dev processed events instantly while production dealt with partition rebalancing and back-pressure from a slow downstream service.
- /kafka
- /observability
- /architecture
- /financeops
- /blog
- November 4, 2024 4 min
The day our monolith's database hit 80% CPU and nobody noticed until sales called
A slow query compounded by a missing index took our PostgreSQL instance to the brink. Sales noticed before engineering did because client reports were timing out.
- /postgres
- /observability
- /architecture
- /financeops
- /experience
- September 4, 2024 5 min
Observability on a shoestring: Grafana, Loki, and Tempo for free
We could not afford Datadog. I self-hosted the Grafana stack on our k3s homelab cluster and pointed production at it.
- /observability
- /kubernetes
- /homelab
- /docker
- /blog