Where is every byte?
A few weeks ago I profiled a Node.js server with recurring memory spikes. I found out dirty pages, allocator behavior and memory that never came back. To build a cleaner mental model, I stripped the problem down to something smaller: python3 -m http.server.
root@debian:/# ps aux | grep http.server | grep -v grep
root 479226 0.0 0.1 32104 19324 pts/1 T 16:10 0:00 python3 -m http.server
19 MiB of resident memory. Where is every byte?
The kernel exposes everything it knows about a process’s memory in /proc/pid/maps. Each line is a region of virtual memory that the kernel allocated for this process. Let’s look at it.
00400000-00420000 r--p 00000000 103:02 935097 /usr/bin/python3.13
00420000-0073f000 r-xp 00020000 103:02 935097 /usr/bin/python3.13
0073f000-009f2000 r--p 0033f000 103:02 935097 /usr/bin/python3.13
009f2000-009f3000 r--p 005f1000 103:02 935097 /usr/bin/python3.13
009f3000-00a84000 rw-p 005f2000 103:02 935097 /usr/bin/python3.13
00a84000-00af8000 rw-p 00000000 00:00 0
1d84f000-1dabc000 rw-p 00000000 00:00 0 [heap]
7fae4cd9f000-7fae4cf05000 rw-p 00000000 00:00 0
...
7fae4e03f000-7fae4e067000 r--p 00000000 103:02 920382 /usr/lib/x86_64-linux-gnu/libc.so.6
7fae4e228000-7fae4e235000 rw-p 00000000 00:00 0
...
(8 more shared libraries, same pattern)
...
7fae4e37b000-7fae4e37d000 r-xp 00000000 00:00 0 [vdso]
7fae4e37d000-7fae4e37e000 r--p 00000000 103:02 920377 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
...
7fae4e3b4000-7fae4e3b5000 rw-p 00000000 00:00 0
7ffca9328000-7ffca9349000 rw-p 00000000 00:00 0 [stack]
Sorry for the wall text, but every line follows the same format: an address range, permissions, an offset, a device, an inode, and sometimes a filename. Some lines point to files on disk (libssl.so.3, for example). Others are anonymous - memory the process created from scratch (like 7fae4e374000-7fae4e375000 rw-p 00000000 00:00 0).
Let’s take the first line to understand what’s happening.
(1)00400000-00420000 (2)r--p (3)00000000 (4)103:02 (5)935097 (6)/usr/bin/python3.13
00400000-00420000 is a range of virtual addresses - just addresses in the process’s own address space. r–p is the permissions: readable, not writable, not executable, private. 00000000 is the offset into the file - this mapping starts at byte 0. 103:02 is the block device (major:minor). 935097 is the inode - the file’s identity on disk. And /usr/bin/python3.13 is the file.
The binary
python3.13 appeared 6 times. Same inode (935097), but different permissions and offsets. Why does one file need 6 mappings?
The answer is in the ELF binary. ELF is the format Linux uses for executables. The binary carries instructions for the kernel: which parts of the file to map, where, and with what permissions. We can read these instructions with readelf -l /usr/bin/python3.13.
There are 15 program headers, but we care about the 4 LOAD headers. Each LOAD tells the kernel: “map this range of bytes from the file into memory at this virtual address, with these permissions.” The kernel creates one VMA per LOAD.
Type Offset VirtAddr FileSiz MemSiz Flags
LOAD 0x0000000 0x00400000 0x01f1d0 0x01f1d0 R
LOAD 0x0020000 0x00420000 0x31ebf9 0x31ebf9 R E
LOAD 0x033f000 0x0073f000 0x2b2728 0x2b2728 R
LOAD 0x05f1db8 0x009f2db8 0x090950 0x104f90 RW
Four LOAD segments. But maps showed 6 VMAs. Where did the extra two come from?
They all come from LOAD 4, which splits into three:
| LOAD | VirtAddr | Flags | maps VMA | What’s in it1 |
|---|---|---|---|---|
| 1 | 0x400000 | R | 00400000-00420000 r–p | ELF headers, read-only data |
| 2 | 0x420000 | R E | 00420000-0073f000 r-xp | .text - the executable code |
| 3 | 0x73f000 | R | 0073f000-009f2000 r–p | .rodata - string constants, tables |
| 4 | 0x9f2db8 | RW | 009f2000-009f3000 r–p | GNU_RELRO - resolved, now read-only2 |
| 4 | 009f3000-00a84000 rw-p | .data - initialized globals | ||
| 4 | 00a84000-00af8000 rw-p | .bss - uninitialized globals (anonymous) |
Two things worth stopping at here.
First, GNU_RELRO. The .got (Global Offset Table) holds addresses that the dynamic linker fills in at startup. Once resolved, the linker marks the region read-only2. That 4 KB VMA started writable, got populated, then was locked down.
Second, look at LOAD 4’s FileSiz vs MemSiz. FileSiz is 0x90950 (580 KB) but MemSiz is 0x104f90 (1,044 KB). MemSiz is bigger. Where do the extra 464 KB come from?
That’s .bss - uninitialized global variables. These are all zero at startup, so the binary doesn’t store the zeros. It just records how much space it needs, and the kernel creates an anonymous mapping and zero-fills it on demand.
The first has an inode. The second has 00:00 0 - no inode, no file.
Shared libraries
So the binary accounts for 6 VMAs. But maps has many more - libc, libm, libz, libssl, all with the same multi-VMA pattern. Who loaded them?
Not python3. The binary can’t load shared libraries by itself. Look at what the ELF binary says when we run readelf -l:
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
There’s a program header called INTERP, which points to /lib64/ld-linux-x86-64.so.2 - the dynamic linker. This is a real program. You can run it directly, and if you do it will call you a strange lad, then tell you what it does:
You have invoked ’ld.so’, the program interpreter for dynamically-linked ELF programs. Usually, the program interpreter is invoked automatically when a dynamically-linked executable is started.
When the kernel loads python3, it doesn’t jump to python3’s entry point. It finds the INTERP header, maps the dynamic linker into memory, and jumps there first. The linker opens each shared library, mmap’s their LOAD segments (creating the VMAs we see in maps), resolves symbols, and only then jumps to python3’s actual entry point.
If you’re interested in how the ELF loading sequence works end to end, I found this lwn article really good.
The binary declares its dependencies explicitly:
root@debian:~# readelf -d /usr/bin/python3.13 | grep NEEDED
0x0000000000000001 (NEEDED) Shared library: [libm.so.6]
0x0000000000000001 (NEEDED) Shared library: [libz.so.1]
0x0000000000000001 (NEEDED) Shared library: [libexpat.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
The linker reads this list and loads them all before python3’s code runs.
One thing you might hav noticed: ldd shows 5 libraries, and our maps output had 8 more - libssl, libcrypto, libbz2, and their Python wrappers. Those were loaded later, at runtime, via dlopen() - when Python imported modules like ssl or lzma. The linker handles startup dependencies. dlopen handles runtime dependencies.
Anonymous memory
We’ve accounted for the file-backed VMAs. But maps also shows lines with no file at all:
00a84000-00af8000 rw-p 00000000 00:00 0
1d84f000-1dabc000 rw-p 00000000 00:00 0 [heap]
7fae4cd9f000-7fae4cf05000 rw-p 00000000 00:00 0
No inode, no file. 00:00 0. All rw-p: readable, writable, private. These are anonymous mappings.
We already saw one: the .bss region at 00a84000. The [heap] is where small malloc() allocations go to. The unlabeled regions are from mmap(MAP_ANONYMOUS) - larger allocations that bypass the heap.
Anonymous memory is the expensive kind. A file-backed page can be evicted from RAM and reloaded from disk - the file is still there. An anonymous page has no backing file. If the kernel needs to reclaim it, the only option is swap. No swap? The page stays in RAM until the process frees it or dies.
It gets worse. The kernel works in 4 KB pages, but the process doesn’t. When python3 allocates a 32-byte string, the allocator (glibc’s malloc in this case) take those 32 bytes out of a 4 KB page it already owns. When the string is freed, malloc reclaims the 32 bytes internally - but it doesn’t return the page to the kernel. It keeps the page for future allocations. The kernel still sees a Private_Dirty page with a valid PTE. It has no idea that most of the page is free.
This is why Rss stays high after load drops. The application freed its objects. The allocator reclaimed the space. But the pages are still mapped, still resident. The memory “never comes back” because the kernel and the allocator disagree about what’s free.
The page cache
We said shared libraries are “shared.” But shared with whom, exactly?
root@debian:~# cat /proc/479226/smaps_rollup
Rss: 19324 kB
Pss: 14879 kB
Shared_Clean: 4988 kB
Private_Clean: 4840 kB
Private_Dirty: 9496 kB
4.9 MiB of Shared_Clean. Shared with every other process that uses libc, libm, libz. When the kernel loads libc’s .text segment, it doesn’t give the pages directly to the process. It puts them in the page cache - a system-wide cache of file contents - and points the process’s page table at those cached pages. If another python3 starts, its page table points to the same physical frames. There is one copy of libc’s code in RAM, and it is used by everone.
Rss doesn’t know this. It counts every physical page mapped into the process, shared or not. If libc’s .text is 1.4 MiB and ten processes use it, all ten report that 1.4 MiB in their Rss. Same physical memory, counted ten times. Pss tries to fix this by dividing shared pages across processes - that’s the 4.5 MiB gap between Rss (19 MiB) and Pss (14.8 MiB).
So when someone asks “how much memory is this process using?”, the honest answer is: it depends on what you mean by “using.” Kill this python3, and what actually comes back? The Shared_Clean pages stay - other processes still reference them. Private_Clean (4.8 MiB) gets reclaimed - file-backed but only this process had them mapped. Private_Dirty (9.4 MiB) gets reclaimed - heap, mmap, .bss, anonymous, no other references. You get back roughly 14.3 MiB. The shared pages were never yours to free.
Virtual vs physical3
We’ve mapped the binary, the shared libraries, the anonymous regions. The kernel created VMAs for all of them. But creating a VMA doesn’t cost any physical memory. The kernel reserved address space. Did it actually put anything in RAM?
32104 is VSZ - the process’ virtual size. 19324 is Rss - the process’ resident set size. 32 MiB of virtual address space, but only 19 MiB actually in physical RAM. Where’s the other 13 MiB?
Nowhere. It doesn’t exist yet.
Every address in maps is virtual. When the kernel creates a VMA for libm’s .text, it reserves a range of virtual addresses. But the page table entries are empty. No physical frames allocated. The address range exists, the memory behind it doesn’t.
The first time the CPU accesses an address in that range, the MMU tries to translate the virtual address to a physical one, finds no page table entry, and raises a page fault. The kernel handles it: allocates a physical frame, loads the data from disk (or finds it already in the page cache), creates the page table entry, and the CPU retries. Now that page is resident - it counts toward Rss.
This is demand paging. The kernel promises address space without backing it with physical memory. A 1 GB shared library mapped into your process adds 1 GB to VSZ. But if you only call one function, Rss grows by maybe 4 KB - one page.
The gap between VSZ and Rss is all the virtual address ranges that exist but haven’t been touched yet - or were touched and then evicted. VSZ is what the kernel promised. Rss is what’s actually in RAM right now.
And even the pages that are in RAM aren’t necessarily yours. When python3 faults on a libm page, the kernel loads that page into a physical frame. But that frame belongs to the page cache, not to python3. Ten processes can map libm. One physical copy. The frame is shared - it shows up in every process’s Rss, but killing any one of them doesn’t free it.
The special ones
Three VMAs left at the bottom of maps:
7fae4e377000-7fae4e37b000 r--p 00000000 00:00 0 [vvar]
7fae4e37b000-7fae4e37d000 r-xp 00000000 00:00 0 [vdso]
7ffca9328000-7ffca9349000 rw-p 00000000 00:00 0 [stack]
[stack] is the process’s call stack - function arguments, local variables, return addresses. 132 KB.
[vdso] (virtual dynamic shared object) is a small library the kernel maps into every process. It lets the process call things like gettimeofday and clock_gettime without entering the kernel - the code runs in userspace, reading from [vvar]. Fast path for syscalls that get called thousands of times per second.4
19 MiB
We started with a number from ps and broke it down.
Rss is 19 MiB. That’s what ps reports, what kubectl top shows, what your OOM limits use. It’s a conservative number: it counts everything mapped and resident, including shared pages.
Most of the time, that’s enough. But when memory behaves oddly - Rss grows and doesn’t drop, or OOMs don’t line up with expectations - the number alone isn’t enough.
Part of those 19 MiB is shared library code that exists once in RAM no matter how many processes map it. Part of it is allocator-held memory the kernel still counts as used. The portion that actually goes away when the process dies is smaller - about 14.3 MiB here.
The breakdown is in /proc/pid/smaps_rollup.
Some questions I’ve been working on to strengthen my mental map. If you liked this, you may wanna try answering them.
The “What’s in it” column comes from the ELF section-to-segment mapping in
readelf -l:
↩︎02 .note.gnu.property .note.gnu.build-id .interp .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .stapsdt.base .eh_frame_hdr .eh_frame .note.ABI-tag 05 .tdata .init_array .fini_array .dynamic .got .got.plt .data .PyRuntime .probes .bssGNU_RELRO (RELocation Read-Only). This prevents an attacker from overwriting function pointers in the GOT. Found this interesting article from Red Hat: https://www.redhat.com/en/blog/hardening-elf-binaries-using-relocation-read-only-relro ↩︎ ↩︎
I will introduce a bunch of new concepts in a few lines. They are hard to understand. This post won’t cover them because I plan to write about zones, regions, etc, in another post. The best resource I found so far is the “Understanding the Linux Virtual Memory Manager”, by Mel Gorman. Check out: https://www.kernel.org/doc/gorman/html/understand/index.html. Keep in mind that Gorman works on kernel 2.6. This is an old version, and some things may not be updated. This is highly recommended as a foundational work, though. ↩︎
See
man 7 vdsofor details. The vdso avoids the overhead of a full syscall (context switch to kernel mode and back) for frequently called, read-only kernel data like the current time. ↩︎