Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Coping with the TCP TIME-WAIT state on busy Linux servers (bernat.im)
69 points by arunc on May 2, 2014 | hide | past | favorite | 18 comments


Seems like if you are in the situation described at the top - behind a load balancer with a limited number of possible connections per minute, you also don't have to deal with NAT - so you can safely enable options that don't work with it.

So it's one or the other: Either you have to deal with NAT, but then you also have plenty of remote IP's and no issue of running out, or have only one remote IP and no issue of NAT.

i.e. you'll never have both problems at once.


Wouldn't it be that clients behind a NAT might end up attempting to connect from the same source address:port (the NAT box), or a load balancer would make connections all from the same address?

i.e. when you have to deal with NAT, there's a nonzero possibility of a customer getting an error simply because your app is popular and their ISP has run out of IP's and ports.

Or when you don't have to deal with NAT directly, you're limited to ~60k incoming connections per minute which may not be enough.

The article suggests listening on more ports so incoming connections are not all on the same port. I have my doubts that will work. (User-hostile to expect users to add a port number to the URL, or non-standard ports only used for automated connections which limits the scope of the solution.)

If behind a load balancer, assigning more IPs to the server is easy since they're not public IPs, so that seems like a good solution.

On the other hand, the article hints at two really good solutions:

1. End the connection with a RST instead of a clean close(). Yes, the other side will see an error at that point but I wonder if browsers wouldn't just silently ignore that after receiving the entire response?

2. Definitely won't work for browsers, but if the client does the close() call then the client gets to handle the TIME-WAIT and the server can recycle ports as fast as it wants.

P.S. With IPv6 assigning more public IPs is a great solution.


> Wouldn't it be that clients behind a NAT might end up attempting to connect from the same source address:port (the NAT box)

They shouldn't. The NAT should know not to recycle ports so quickly.

> or a load balancer would make connections all from the same address?

It would, but then your load balancer is not behind a NAT.

> Or when you don't have to deal with NAT directly, you're limited to ~60k incoming connections per minute which may not be enough.

But then you can enable the options that don't work well with NAT.

> The article suggests listening on more ports .. User-hostile ....

No, that's only for the load balancer to do internally. It's not for external use.


NAT is probably the most common one, at my company we run into strange issues because some idiot enabled tw_recycle.

In our situation we used a hardware load balancer (F5 Viprion) which worked in active-active mode. The tw_recycle was enabled on the server nodes that were load balanced.

Long story short, everything worked fine until some traffic was applied, then some connections started to hang for few seconds. Our first assumption was that the load balancer had issues or the switches it was connected to. It took tons of hours and packet captures to realize that the problem was due to differences of tcp timestamp (blades don't have exact same time and per rfcs they don't need to) and then tracing dependence on timestamp to this setting.

So please don't enable this setting, everything might work fine initially in your setup for months, then the one day you will observe strange behavior and start pulling your hair trying to figure out what's going on.


This is an amazing writeup. I have run into this before (running out of sockets) stress testing a web proxy, and when I did, I cursed the TCP designers for designing only 32 bit sequence numbers and 16 bit port numbers.

That said, maybe there is still a need for TIME_WAIT in a distributed protocol that can't guarantee sequence number uniqueness 100%. I'm glad the article provided detailed cpu and memory measurements explaining their costs, which don't seem too bad. It's running out of tuples due to a practically arbitrary and short-sighted limit that is the killer.

Also liked the interesting notes about socket linger.


If your servers are handling HTTP traffic, another big win is to make sure HTTP keep-alives are enabled on the servers. This will cause connections to be reused and so fewer connections will be closed.


More importantly, the TIME_WAIT state only happens for the socket that initiated the close (active close). In the case of a HTTP keep-alive, the active close is usually done by the client instead of the server (avoiding a TIME_WAIT on the server side).

A general rule when designing a network protocol : do your best to make the client-side closes the connection when possible.


Very true.

Note that the web page discusses load balancers in a few places. If you have such a setup, the LB is effectively acting as a client and so the TIME_WAITs will still pile up on one of your machines.

If the load balancer can re-use the HTTP connections from LB<->web server with keepalives, then it will reduce them.


Unless of course you have a high bounce rate (intended or otherwise) never use HTTP keep-alive.


I think you mean the opposite advice of the natural reading of your text.

I read your advice as "Only use HTTP keep-alive if you have a high bounce rate", when I presume you mean the opposite.

  1. Unless P, then Q.
  2. P is "have a high bounce rate"
  3. Q is "do not use keep alives"
I think you meant:

  1. If P, then Q.
  2. P and Q same as above.


Yes, I meant do NOT use keep-alive if you have a high bounce rate.


Can't edit the above post, but what I meant was: do NOT use keep-alive if you have a high bounce rate.

Apologies for the poor wording.


Can you provide some background on when and why you believe this is good advice? Otherwise this is exactly the kind of technical folklore the original post was complaining about – it's a broad assertion with no supporting theory to help people understand whether it's applicable to their situation.


I meant do NOT use keep-alive if you have a high bounce rate.

High traffic and high bounce rate, will leave a lot of TIME_WAIT hanging around for precious seconds and soon enough your server won't be able to accept new connections.

Obviously this applies if you have both: high traffic and high bounce rate. Any high traffic service over TCP where a client connects to the server for a very brief moment and doesn't come back until hours or days later, or never.


These are exactly the missing details, modulo the qualifier that "high traffic" generally means at least tens of thousands of persistent connections on a modern web server.


I think they're trying to say that if someone is likely to make just the one connection to your site, and then never fetch further pages, then you don't want to turn keep-alive on. There would be no point maintaining the connection if the client isn't likely to use it.

Imagine you run a site that provided a common javascript library. People are encouraged to add a link to fetch the .js file from their 3rd party webpages. So most clients contacting your server will be there to fetch one file only. Even if the user continues to browse the 3rd party website, they won't make further requests to your server. One connection, one request. Here, keepalives would be a bad idea.


I'm pretty sure the explanation is something like that but I felt the sloppy wording and completely lack of any explanation was exactly the kind of thing we need less of in tech advice, particularly when describing an extreme edge-case.

The original wording was poor enough that it's ambiguous whether the claim is “enable keepalive unless you have a high bounce rate” or “enable keepalive if you have a high bounce rate”.


net.inet.tcp.nolocaltimewait on FreeBSD is pretty nice if you're connecting to thinks locally (proxy, nginx, varnish, etc etc)

net.inet.tcp.nolocaltimewait: Do not create compressed TCP TIME_WAIT entries for local connection




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: