108,725 forks.
First week at a new job. A colleague was showing me around our Grafana dashboards - routine tour of the baremetal machines. One caught my eye: 32GB RAM, top-of-the-line processor, hitting 90% CPU. A few containers running, no alerts, nobody had reported anything.
I found a process with cmd bash startup.sh. Running for 28 minutes.
I straced it for a few minutes:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
75.36 52.698676 390 134819 26094 wait4
12.27 8.583416 78 108725 fork
7.42 5.190671 5 926332 rt_sigprocmask
2.69 1.881557 5 330523 165262 close
1.17 0.817651 9 82631 pipe
0.64 0.449359 5 78280 rt_sigaction
0.24 0.164395 6 26094 rt_sigreturn
0.21 0.147095 5 26094 26094 ioctl
------ ----------- ----------- --------- --------- ----------------
100.00 69.932820 40 1713498 217450 total
108,725 forks. 134,819 wait4 calls. The delta is 26,094. That’s exactly the number of wait4 errors. The script is forking children and also waiting on processes it didn’t create. It’s reaping things that aren’t its own.
26,094 ioctl errors. 26,094 rt_sigreturn calls. Same number, three columns. The script is killing other processes.
I looked for the file:
find . -type f -name "startup.sh"
Nothing. The process is running but the file is gone. Deleted after execution. But the kernel keeps the inode alive while the process holds it open. Bash opens the script on fd 255.
cat /proc/3668040/fd/255
A cryptominer wrapper. It kept a fake redisserver running. If it detected an interactive session, it tried to kill the miner to hide itself. It killed wget, curl, and import so nobody could easily download tools. It killed processes above 90% CPU and known malware names - kinsing, c3pool - so it wouldn’t have to share the machine.
The strace told the whole story before I ever read the script. 108k forks to keep its own children alive. 26k extra wait4s to reap competing processes it was killing. The numbers didn’t add up because the script wasn’t just running. It was hunting.
I checked for Postgres logs. /var/log/postgresql was deleted.
We burned the machine.