Production on

Production onhttps://frn.sh/c/production/Recent content in Production onHugoen-USCopyright © Fernando Simões.Sat, 21 Mar 2026 00:00:00 +0000Where did 400 MiB go?https://frn.sh/pmem/Sat, 21 Mar 2026 00:00:00 +0000https://frn.sh/pmem/I restarted all 60+ pods of a Node.js websocket app earlier today. Every single pod sitting at ~330 MiB of memory. Except one, which was double the rest - at 640 MiB. This is a statefulset. When I built the cluster, I estimated each pod’s footprint: ~198 MiB base, plus ~25 MiB per websocket. With 30 websockets per pod, that’s roughly 900 MiB. I was wrong about the per-websocket cost - it’s lower than 25 MiB in practice.Between select and diskhttps://frn.sh/iops/Sun, 08 Feb 2026 00:00:00 +0000https://frn.sh/iops/We had a Postgres incident this week. Heroku timeouts, multiple queries running for 30+ minutes, and the IOPS pinned at the provisioned limit. I knew I needed a better index, but I wanted to understand what “reading from disk” actually means first. How many layers of caching sit between a SELECT and the storage device? Three. First: shared buffers, Postgres’ own cache living in the process memory. If the page is there, we need no system call - just a memory read.108,725 forkshttps://frn.sh/tforks/Thu, 11 Dec 2025 00:00:00 +0000https://frn.sh/tforks/First week at a new job. A colleague was showing me around our Grafana dashboards, just routine monitoring of the baremetal machines. One caught my eye: a machine with 32GB RAM and a top-of-the-line processor was hitting 90% CPU. A few containers running, no alerts, and nobody had reported anything. I found a process with cmd bash startup.sh that had been running for 28 minutes. I straced it for a few minutes:Sigterm a D state processhttps://frn.sh/sigterm/Sun, 08 Jun 2025 00:00:00 +0000https://frn.sh/sigterm/Load average hit 12 on a 2 vCPU machine during a production incident. My first thought was that CPU must be the bottleneck - 12 is 6x the core count. But it wasn’t. Linux load average counts three things: processes running on a CPU, processes waiting in the run queue, and processes in uninterruptible sleep - D state. From the kernel source: The global load average is an exponentially decaying average of nr_running + nr_uninterruptible.