Are there TCP/IP stacks out there in common use that are allocating memory all t...

fulafel · 2026-03-13T06:04:59 1773381899

Yes, TCP is pretty hungry for buffers. The bandwidth*delay product can eat gigs of memory on a server. You have to be ready to retransmit anything that's in flight / haven't received the ack for yet.

nly · 2026-03-13T13:10:02 1773407402

The bandwidth delay product for a 10Gbps stream for a 300ms RTT theoretically only requires ~384MB

One option is just to simply keep buffers small and fixed and disconnect blocked clients on write() after some timeout

fulafel · 2026-03-13T20:33:15 1773433995

We're up to hundreds of gbps per server, have been for some years now. Eg 400 gbps uses a lot even with much smaller avg rtt. That's not going ng to be one stream of course, but a zillion smaller streams still add up to the same reqs.

This is far from little embedded device territory of course. But still, latest wifi is closer to 10 than 1 gbps already.

Veserv · 2026-03-13T21:06:13 1773435973

I do not understand the point you are trying to make. The person you replied to showed how to evaluate it with simple math.

400 Gb/s is 50 GB/s. RTT of 300 ms would only require 15 GB of buffers. That would not even run a regular old laptop out of memory let alone a server driving 400 Gb/s of traffic. That would be single-digit percents to possibly even sub-percent amounts of memory on such a server.

fulafel · 2026-03-14T04:46:14 1773463574

I introduced the concept of bandwidth * delay product to the conversation...

The question was about why use dynamic allocation. In this branch of the thread we ere discussing the question "Are there TCP/IP stacks out there in common use that are allocating memory all the time?"

We'd not be happy to see the server or laptop statically reserving this worst-case amount of memory for TCP buffers, when it's not infact slinging around max nr of tcp connections, each with worst-case bandwidth*delay product. Nor would be happy if the laptop or server only supported little tcp windows that limit performance by capping the amount of data in-flight to a low number.

We are happier if the TCP stack dynamically allocates the memory as needed, just like we're happier with dynamic allocation on most other OS functions.

CyberDildonics · 2026-03-13T15:00:09 1773414009

Needing memory doesn't have to mean allocating memory over and over. Memory allocation is expensive. If someone is doing that reusing memory is going to be by far the best optimization.

fulafel · 2026-03-13T20:26:19 1773433579

Well, allocating and freeing according to need is reusing. Modern TCP perf is not bottlenecked by that. There's pools of recycled buffers that grow and shrink according to load etc.

CyberDildonics · 2026-03-14T14:55:29 1773500129

Well, allocating and freeing according to need is reusing

That's a twisted definition. It seems like you're playing around with terms, but allocating memory from a heap allocator is obviously what people are talking about with "dynamic memory allocation". Reusing memory that has already been grabbed from an allocator is not reallocating memory. If you have a buffer and it works you don't need to do anything to reuse it.

Modern TCP perf is not bottlenecked by that. There's pools of recycled buffers that grow and shrink according to load etc.

If anything is allocating memory from the heap in a hot loop it will be a bottleneck.

Reusing buffers is not allocating memory dynamically.

fulafel · 2026-03-15T09:44:57 1773567897

Sorry but there's shades of gray between heap allocation and TCP specific free lists in TCP impls. It's not a black and white free list vs malloc API situation.

For example in Linux there are middle level abstraction layers in play as follows:

For the payload there's a per socket runway of memory to be used for example (sk_page_frag). Then, if there's a miss on that pool, instead of calling the malloc (or kmalloc in the case of Linux) API, it invokes the page allocator to get a bunch of VM pages in one go, which is again more efficient than using the generic heap API. The page allocation API will recycle recently freed large clusters of memory, and the page alloc is again backed by a CPU-local per cpu pageset etc. It's turtles all the way down.

For the metadata (sk_head) there's separate skbuff_head_cache that facilitates recycling for the generic socket metadata, which is again not a TCP specific thing but lower level than generic heap allocator, somewhere between a TCP free list and malloc in the tower of abstractions.

CyberDildonics · 2026-03-15T20:58:42 1773608322

It's not a black and white free list vs malloc API situation.

It is in the sense that if finding a buffer that's the right length was just as slow as malloc (and free) then you would just use malloc.

Not only that but malloc is shared with the entire program and can do a lot of locking. On top of that there is the memory locality of using the same heap as the rest of the program.

If you just make your own heap there is a big difference between using the system allocator over and over and reusing local memory buffers for a specific thread and purpose.

What you're describing here the same thing, avoiding the global heap allocator.

wmf · 2026-03-13T01:26:41 1773365201

Packets and sockets have to be stored in memory somehow. If you have a fixed pool that you reuse it's basically a slab allocator.

CyberDildonics · 2026-03-13T03:52:41 1773373961

You need some memory but that doesn't mean you would constantly allocate memory. There is a big difference between a few allocations and allocating in a hot loop.

bobmcnamara · 2026-03-13T13:50:36 1773409836

Yes, it is pretty common.

However sometimes the buffers are pooled so buffer allocator contention only occurs within the network stack or within a particular nic.