Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://learn.microsoft.com/en-us/sysinternals/downloads/vmm... for an empty sublime text window gives me:

- 100MB 'image' (ie executable code; the executable itself plus all the OS libraries loaded.)

- 40MB heap

- 50MB "mapped file", mostly fonts opened with mmap() or the windows equivalent

- 45MB stack (each thread gets 2MB)

- 40MB "shareable" (no idea)

- 5MB "unusable" (appears to be address space that's not usable because of fragmentation, not actual RAM)

Generally if something's using a lot of RAM, the answer will be bitmaps of various sorts: draw buffers, decompressed textures, fonts, other graphical assets, and so on. In this case it's just allocated but not yet used heap+stacks, plus 100MB for the code.

Edit: I may be underestimating the role of binary code size. Visual Studio "devenv.exe" is sitting at 2GB of 'image'. Zoom is 500MB. VSCode is 300MB. Much of which are app-specific, not just Windows DLLs.



Turning these numbers into "memory consumption" gets complicated to the point of being intractable.

The portions that are allocated but not yet used might just be page table entries with no backing memory, making them free. Except for the memory tracking the page table entries. Almost free....

A lot of "image" will be mmapped and clean. Anything you don't actually use from that will be similarly freeish. Anything that's constantly needed will use memory. Except if it's mapped into multiple processes, then it's needed but responsibility is spread out. How do you count an app's memory usage when there's a big chunk of code that needs to sit in RAM as long as any of a dozen processes are running? How do you count code that might be used sometime in the next few minutes or might not be depending on what the user does?


This assumes that executable code pages can be shared between processes. I'm skeptical that this is still a notable optimization on modern systems because dynamic linking writes to executable memory to perform relocations in the loaded code. So this would counteract copy on write. And at least with ASLR, the result should be different for each process anyway.


ld writes to the GOT. The executable segment where .text lives is not written to (it's position independent code in dynamic libraries).

ASLR is not an obstacle -- the same exact code can be mapped into different base addresses in different processes, so they can be backed by the same actual memory.


That’s true on most systems (modern or not), but actually never been true on Windows due to PE/COFF format limitations. But also, that system doesn’t/can’t do effective ASLR because of the binary slide being part of the object file spec.


I can't reconcile this with the code that GCC generates for accessing global variables. There is no additional indirection there, just a constant 0 address that needs to be replaced later.


Assuming the symbol is defined in the library, when the static linker runs (ld -- we're not talking ld.so), it will decide whether the global variable is preemptable or not, that is, if it can be resolved to a symbol outside the dso. Generally, by default it is, though this depends on many things -- visibility attributes, linker scripts, -Bsymbolic, etc. If it is, ld will have the final code reach into the GOT. If not, it can just use instruction (PC) relative offsets.


I've never observed a (non-LTO) linker exchange instructions. I want to see an example before I can believe this.


I'm not sure if you're just trolling, but I'll give the same example I gave before (you can get even wilder simplifications -- called relaxations -- with TLS, since there are 4 levels of generality there). I'm not sure what you meant by "changing isntructions", but in the first case the linker did the fixup indicated by the relocation and in the second reduced the generality of the reference (one less level of indirection by changing mov to lea) because it knew the symbol could not be preempted (more exactly, the R_X86_64_REX_GOTPCRELX relocation allows the linker to do the relaxation if it can determine that it's safe to)

  root@1f0775a74fd7:/tmp# cat a.c
  int glob;
  int main() {
   return glob;
  }
  root@1f0775a74fd7:/tmp# gcc -c a.c -fPIC -o a.o
  root@1f0775a74fd7:/tmp# objdump --disassemble=main a.o
  
  a.o:     file format elf64-x86-64
  
  
  Disassembly of section .text:
  
  0000000000000000 <main>:
     0: f3 0f 1e fa           endbr64
     4: 55                    push   %rbp
     5: 48 89 e5              mov    %rsp,%rbp
     8: 48 8b 05 00 00 00 00  mov    0x0(%rip),%rax        # f <main+0xf>
     f: 8b 00                 mov    (%rax),%eax
    11: 5d                    pop    %rbp
    12: c3                    ret
  root@1f0775a74fd7:/tmp# readelf -rW a.o | grep glob
  000000000000000b  000000030000002a R_X86_64_REX_GOTPCRELX 0000000000000000 glob - 4
  root@1f0775a74fd7:/tmp# gcc -shared -o a.so a.o
  root@1f0775a74fd7:/tmp# objdump --disassemble=main a.so
  (...)
  00000000000010f9 <main>:
      10f9: f3 0f 1e fa           endbr64
      10fd: 55                    push   %rbp
      10fe: 48 89 e5              mov    %rsp,%rbp
      1101: 48 8b 05 b8 2e 00 00  mov    0x2eb8(%rip),%rax        # 3fc0 <glob-0x4c>
      1108: 8b 00                 mov    (%rax),%eax
      110a: 5d                    pop    %rbp
      110b: c3                    ret
  (...)
  root@1f0775a74fd7:/tmp# readelf -r a.so | grep glob
  000000003fc0  000600000006 R_X86_64_GLOB_DAT 000000000000400c glob + 0
  root@1f0775a74fd7:/tmp# gcc -shared -Wl,-Bsymbolic -o a.symb.so a.o
  root@1f0775a74fd7:/tmp# readelf -r a.symb.so | grep glob
  root@1f0775a74fd7:/tmp# objdump --disassemble=main a.symb.so
  (...)
  Disassembly of section .text:
  
  00000000000010f9 <main>:
      10f9: f3 0f 1e fa           endbr64
      10fd: 55                    push   %rbp
      10fe: 48 89 e5              mov    %rsp,%rbp
      1101: 48 8d 05 04 2f 00 00  lea    0x2f04(%rip),%rax        # 400c <glob>
      1108: 8b 00                 mov    (%rax),%eax
      110a: 5d                    pop    %rbp
      110b: c3                    ret
  (...)


OK, I spent a few additional minutes digging into this. It's been too long since I looked at those mechanisms. Turns out my brain was stuck in pre-PIE world.

Global variables in PIC shared libraries are really weird: the shared library's variable is placed into the main program image data segment and the relocation is happening in the shared library, which means that there is an indirection generated in the library's machine code.


Are you looking at the code before or after the static linker runs?


Dynamic linking doesn't have to write to code. I'm not familiar with other platforms, but on macOS, relocations are all in data, and any code that needs a relocation will indirect through non-code pages. I assume it's similar on other OSes.

This optimization is essential. A typical process maps in hundreds of megabytes of code from the OS. There are hundreds of processes running at any given time. Eyeballing the numbers on an older Mac I have here (a newer one would surely be worse) I'd need maybe 50GB of RAM just to hold the code of all the running processes if the pages couldn't be shared.


Tx for the breakdown. I will play around with it later on my windows machine.

But isn't it crazy how we throw out so much memory just because of random buffers? It feels wrong to me


As pointed out below, quite a lot of that isn't in RAM - see "working set".

There's a common noob complaint about "Linux using all my RAM!" where people are confused about the headline free/buffers numbers. If there's a reasonable chance data could be used again soon it's better to leave it in RAM; if the RAM is needed for something else, the current contents will get paged out. Having a chunk of RAM be genuinely unallocated to anything is doing nothing for you.


Nitpick: What you're describing is the disk cache. If a process requests more memory than is free, the OS will not page out pages used for the cache, it will simply either release them (if they're on the read cache) or flush them (if they're on the write cache).


Of course it's doing something for you. Room to defrag other areas of RAM, room to load something new without moving something else out of the way first.

Your perspective sounds like the concept that space in a room does nothing for you until/unless you cram it full of hoarded items.


If we're talking about a storage locker that's instantly reconfigurable then it is probably better to be approximately filling it.

Why would anyone buy a locker 5x the size of their needs ?


If you didn't have the "random" buffers, you'd complain how slow it is. Syntax highlighting? Needs a boatload of caching to be efficient. Code search? Hey, you want a cached code index. Plugins? Gotta run your python code somewhere.

Run vi/nano/micro/joe - they're optimizing for memory to some extent. vi clocks in at under 8 MB. You're giving up a lot of "nice" things to get there.


But I have sublime text open with a hundred files and it's using 12mb.


And how does that breakdown in vmmap? I'm guessing that's working set vs. the whole virtual memory allocation (which is definitely always an overestimate and not the same as RAM)


Virtual memory doesn't matter at all. It's virtual. You can take 2TB of address space, use 5MB of it, and nothing on the system cares.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: