frn.sh

108,725 forks.

First week at a new job. A colleague was showing me around our Grafana dashboards - routine tour of the baremetal machines. One caught my eye: 32GB RAM, top-of-the-line processor, hitting 90% CPU. A few containers running, no alerts, nobody had reported anything.

I found a process with cmd bash startup.sh. Running for 28 minutes.

I straced it for a few minutes:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 75.36   52.698676         390    134819     26094 wait4
 12.27    8.583416          78    108725           fork
  7.42    5.190671           5    926332           rt_sigprocmask
  2.69    1.881557           5    330523    165262 close
  1.17    0.817651           9     82631           pipe
  0.64    0.449359           5     78280           rt_sigaction
  0.24    0.164395           6     26094           rt_sigreturn
  0.21    0.147095           5     26094     26094 ioctl
------ ----------- ----------- --------- --------- ----------------
100.00   69.932820          40   1713498    217450 total

108,725 forks. 134,819 wait4 calls. The delta is 26,094. That’s exactly the number of wait4 errors. The script is forking children and also waiting on processes it didn’t create. It’s reaping things that aren’t its own.

26,094 ioctl errors. 26,094 rt_sigreturn calls. Same number, three columns. The script is killing other processes.

I looked for the file:

find . -type f -name "startup.sh"

Nothing. The process is running but the file is gone. Deleted after execution. But the kernel keeps the inode alive while the process holds it open. Bash opens the script on fd 255.

cat /proc/3668040/fd/255

A cryptominer wrapper. It kept a fake redisserver running. If it detected an interactive session, it tried to kill the miner to hide itself. It killed wget, curl, and import so nobody could easily download tools. It killed processes above 90% CPU and known malware names - kinsing, c3pool - so it wouldn’t have to share the machine.

The strace told the whole story before I ever read the script. 108k forks to keep its own children alive. 26k extra wait4s to reap competing processes it was killing. The numbers didn’t add up because the script wasn’t just running. It was hunting.

I checked for Postgres logs. /var/log/postgresql was deleted.

We burned the machine.