clone, unshare, and setns
The Linux kernel has three interfaces for namespaces: clone, unshare, and setns. clone creates a new process and lets you specify which namespaces the child should share with its parent and which ones should be created fresh - this is what happens under the hood when you run a container. setns lets a process enter an existing namespace, which is exactly what docker exec does. And unshare lets you manipulate namespaces from the shell, which makes it the most fun to play with.
root@debian:~# unshare --pid --mount-proc --fork /bin/bash
root@debian:~# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 7196 3928 pts/1 S 00:31 0:00 /bin/bash
root 2 0.0 0.0 11084 4388 pts/1 R+ 00:31 0:00 ps aux
This creates a child process with its own PID and proc namespaces, so /proc is fresh and process IDs start at 1. If we compare namespace IDs between parent and child, we can see what’s shared and what isn’t:
# Parent process
mnt -> 'mnt:[4026532895]'
net -> 'net:[4026531840]'
pid -> 'pid:[4026532897]'
# Child process
mnt -> 'mnt:[4026531841]'
net -> 'net:[4026531840]'
pid -> 'pid:[4026531836]'
The only namespace with the same ID is net - everything else is isolated. But seeing different IDs in /proc is abstract. I wanted to test what isolation actually means in practice.
I created a network namespace with ip netns, entered it, and started a server:
root@debian:~# ip netns add fakens
root@debian:~# ip netns exec fakens bash
root@debian:~# ip link set lo up
root@debian:~# nc -l 80
From the host:
➜ ~ nc localhost 80
localhost [127.0.0.1] 80 (http) : Connection refused
Connection refused. The host and the namespace have completely separate network stacks - their own routing tables, iptables rules, socket listings. The namespace didn’t just hide the host’s interfaces, it created an entirely new protocol stack for the process inside it.
This is the experiment that made namespaces click for me. PID namespaces are easy to demonstrate but hard to feel - you see a different process tree, but so what? With network namespaces you can actually prove the isolation by trying to connect and failing. The process inside the namespace is genuinely unreachable from the host, running on the same machine, separated only by a kernel abstraction.