As a tip to anyone using nginx to proxy websockets: make sure you increase the proxy read timeout for that connection. Otherwise, nginx will drop the (active) connection much sooner than you'd otherwise expect.
Indeed; all the docs I could find referred to the old pre-websockets-support behavior, where you used TCP-passthrough and there was a (hackish) "websocket_read_timeout" property.
I ended up going with
proxy_read_timeout 31536000;
--meaning that a client can hold open a websocket without the server needing to say anything for a year at a time. I'm not sure whether that's a good idea in all cases, though; it means you won't detect silent backend netsplits (i.e. the network cable getting cut to your appserver box without giving it a chance to FIN) since the TCP connection will still look open.
The more cautious solution might be to use a lower timeout (120s or so) but to have the server send heartbeats over the websocket when it's not doing anything. (If you're using socket.io, you're already getting this behavior for free; if you're using sock.js, you can enable it.) This breaks down at true scale, though: when you have 100000 idle clients you might want to announce to at any time, but don't usually have anything to say to most of them, sending heartbeats to every client in that pool can saturate your link.
A compromise is probably to set the long timeout, but then use a cluster-monitoring service which will push a new config to your load-balancers in response to observed network-toplogy changes.
Yep. We initially went with a 1 hour timeout — worst case, the client drops every hour, and then immediately reconnects. We actually moved to a heartbeat system recently, because we found issues with spotty connections where we'd have packets lost / delayed (for up to a couple minutes), but the server and client both believed they were connected.
Scaling the heartbeat is actually not a big issue. If you're actively using the socket for things, the overhead of that is likely much higher than supporting pings. You'll actually hit port limitations first (if you can support ~65k connected clients per machine), in our experience. 3000 pings a second isn't too bad (that's 100k clients, pinging every 30 seconds or so). You can also change how fast the client is pinging based on client activity.
> You'll actually hit port limitations first (if you can support ~65k connected clients per machine)
This isn't actually a limit, by the way. The port limitation is a uniqueness constraint on full (source IP, source port, dest IP, dest port) tuples; it just means one client can't have more than 65k connections open to your server (which tends to trip people up, because they see themselves running into the limit when benchmarking parallelism--because they're sending all the requests from their own computer.)
What you'll usually hit first is the open file descriptor ulimit.
You can increase the open file descriptor ulimit very easily. Regarding the port-limitation: I think he means the connections from nginx to your Websocket-App...
edit: I thought too fast. Amfy's reply to you is correct (and what I have seen previously). When proxying with nginx, you use up the ports locally (unless I'm missing something?).
Ah, I got confused with what exactly you meant. This is indeed an actual problem, but it's one with an easy solution: you just make each backend process listen on multiple ports, and list them all as separate entries in the nginx's upstream{} section for that backend. One backend with 1024 ports open = 67M connections nginx can make to that backend.
Again, you're not "using up" local ports, just (IP/port, IP/port) pairs--so increasing the number of remote ports you want to talk to allows you to make more connections just as well as if you could increase the number of local ports used to talk to them.
(This might not be so simple for some servers which expect to only listen on one port; the workaround is to use multiple IPs for the backend server, and make sure the backend process is listening on 0.0.0.0. They don't have to be real IPs--you can just as well do port-forwarding on the backend box from one-IP:lots-of-ports to lots-of-virtual-IPs:one-port. It's simple enough to listen on multiple ports in both Node and Erlang, though, so this probably doesn't matter for most people writing websocket servers.)
Yes, for a few reasons — most are application specific, though.
An active websocket connection does have a faster response time than a regular HTTP connection ([1]). The difference here isn't a ton, but may affect real-time applications. The packets are also smaller, so less overhead if you're sending many.
The biggest difference I saw is that when the client or server needs to send several quick requests (many within a couple second), long polling breaks down. From the spec ([2]), "Once the server sends a long poll response, typically the client immediately sends a new long poll request." This delay can add up, and is not pure full duplex. Chunked responses help for server -> client, but client -> server still has the same issues.
Awesome stuff, thanks. Your second point could explain some of the mysterious non-updates that I've seen from time-to-time when using faye with long-polling behind nginx.
You actually need to implement it outside of nginx. The easiest way is just have the client send a message every x seconds, and your application server will immediately respond. If the server does not respond quickly (within a few seconds), close the connection and optionally reconnect.
The default proxy_send_timeout is 60s. If you rely on the information about connected clients (chat anyone?), it's way nicer to ping the server every 30s, than to set huge timeouts. If internet connection of the client breaks (mobile anyone?) and the client has no possibility to close the websocket connection, nginx keeps the connection open and you server thinks that there are some clients to talk to. Nice side effect: Your client knows about the disconnect, because the ping fails.
EDIT: the parent originally said "proxy_read_timeout", and this comment was in response to that.
proxy_read_timeout times out reads from the upstream. It will trigger when a backend server goes AWOL, not when a client falls off the net.
If you want to detect client-side disconnection, you want proxy_send_timeout.
You can emulate this by having server sent heartbeats with a proxy_read_timeout with the client required to respond to them before the server will send any more, thus going read-silent whenever the client fails to respond, but why not just have the client do the pinging? Then the server doesn't have to say anything in response most of the time.
Agreed. We have a larger window so that we can adaptively ping (when the client isn't very active, we don't ping as often). We're not making chat though.
If you do want timeouts like this, an alternative is to handle this on the application layer, and not rely on nginx to drop connections.
I can provide you some pointers as we made it go away. Some data from one of our production virtual hosts:
cat access.log.1 | wc -l
1054423 # no static objects served here
cat error.log.1 | grep timed
For PHP-FPM we use static pm and unix domain sockets. This virtual host if fairly busy with some slow (~200 ms) requests, therefore it uses 96 processes per pool. listen.backlog = -1 in php-fpm.ini for letting the kernel decide the size of the actual connection backlog. UDS is getting filled faster than TCP and nginx starts responding with 502's. Throw net.core.somaxconn = 65535 somewhere in /etc/sysctl.d for increasing the actual backlog since even if you specify a high listen.backlog value, the actual value is truncated to SOMAXCONN. Couple of years ago I wrote an article about stuff like this: http://www.saltwaterc.eu/nginx-php-fpm-for-high-loaded-websi... (shameless plug, I know, but you may get some useful info). As a side note, I am curious how the backend persistency for nginx is playing. Our production still uses the same config since 2011 as it isn't broken, but it may be more efficient.
Both TCP and UDS depend on it. UDS uses an API that follows the BSD sockets standard. However, TCP makes sense when you use nginx as load balancer between multiple PHP-FPM backends. A really tricky setup if you ask me. For UDS, besides lower latency, the namespace is cleaner (filesystem paths vs numeric ports - much easier to automate the configuration), and (at least under Linux) they follow the filesystem ACL. For example, under my setup, only nginx's user is allowed to read / write to the PHP-FPM sockets.
I see you're running FPM -- I had the same problem, and it turned out to be FPM's worker auto-spawning system. I set it to pm = static and set the number of workers to 50, which is the most I ever need, plus some headroom. Give that a go.
Just wanted to update you on this. I switched to another hosting provider (for other reasons) so I ended up working on the server from scratch. The pm is still dynamic because I want to see if I can reproduce the problem again, and if it does I can always try static.
Thanks I will give it a shot when I go home, I can only ssh to my server from home computer. You are right the pm is set to dynamic last time I checked, its the default.
I will update you on this after observing my server for 24 hours.
The Arch package maintainers' job mostly boils down to managing the PKGBUILD. If you get comfortable using ABS your experience with Arch will improve dramatically.
One feature I miss the most in nginx is a dynamic loading of modules. I wonder if this is anywhere on a roadmap, or is there some deeper reason for such limitation?
Using the official YUM repo http://nginx.org/packages/centos/6/$basearch/ they don't build with the `ngx_http_spdy_module` enabled. Is there a way to get it, without doing a custom compile? I'd like to continue to use the YUM package manager.
On Ubuntu, there is an "nginx-extras" package with more modules. I don't have a centos machine handy.
But really, the burden of a 'custom compile' is small. Nginx configuration is simple, it has minimal dependencies, and it builds quickly. You will need zlib-dev and openssl-dev.
I've always used Apache but I really want to use nginx for my next project (it's a Node/express app). How should I start? It'll be running on my Ubuntu VPS. Any tips, suggestions, common pitfalls?
I whole heartedly agree - htaccess and virtual host rules were such a pain in Apache, I've really come to appreciate how easy it was to get nginx setup in comparison.
You could use the Ubuntu apt repository[0] managed by nginx. I've been using that repository for years, always worked great. New releases get pushed (almost?) instantly.
Is this .htaccess to nginx config convertor reliable? http://winginx.com/htaccess Seems like the biggest hurdle to adopting nginx is getting the config in order.
Apache was designed in an era where the webserver did everything, and most of what it was doing was serving static content. And the fundamental design of apache as well as the design of its configuration system is optimized for that. And though it's been modified extensively to do everything a modern web server needs to do in many ways it's still held back by that heritage. Also, apache is fairly monolithic. The way to add functionality to apache is generally with modules, for example, and generally a random apache installation is going to have vastly more features active than you may actually need, or want. All of which affects performance.
Nginx was written from the ground up in the modern web era, when we had already learned about the common ways that real websites worked (dynamic, high throughput, tied into other technology like php or rails, possibly living on multiple servers, etc.) Nginx is designed for efficiency and low overhead. It's also designed to live within an ecosystem rather than live as the container of an ecosystem, at its heart it's not a web-server it's a proxy. And all of this makes nginx easier to set up, easier to configure, and generally more efficient with superior performance compared to apache.
Here's an example. I have a server set up with a couple different sites, some of them are fronted by varnish (for caching), several run on php, and another runs on rails. With nginx this is all fairly easy to do. Instead of installing modules as you would tend to do with apache you set up php and rails as standalone socket-based servers (e.g. php-fpm and unicorn) and then in your nginx configuration you simply tell the web server where those are and how to use them.
Nginx also funnels you into a pattern which makes it easier to scale. Because nginx is just part of an ecosystem and not the container of an ecosystem you can easily migrate components to other servers as necessary. In my example I could push varnish off to another server, push each php app off onto its own server, push rails off onto separate servers, etc. The configuration changes would be fairly minimal and it would all still just work.
I'd argue a little with the "Apache is monolithic". If you know your elbow from your butt, you can make a decent Apache setup as any decent sysadmin should do. The thing that I hate the most in nginx is the lack of DSO modules. When I need a new module, I need a new nginx build. nginx itself has more than you ask for in a standard build. As example, I counted the "with" and "without" flags from out nginx package build script. It has 3 "with" flags (SSL, gzip static, PCRE JIT) vs 14 "without" flags. And we can part with gzip static since most of the static objects are pushed by CDN's now.
Apache is catching up with its evented MPM and proxy support, but I still wouldn't go back to Apache though. The main selling point that the OP should get is the fact that evented servers have much better memory usage under the same load as the threaded servers or (cough) process based servers.
It's simpler, easier to configure, has event-driven "magic" :). But seriously, you can google this and it doesn't make sense to sell HTTP servers like toothpaste anyway.
Just be absolutely sure that you never need the additional features that Apache Httpd provides, and Nginx will probably be a good choice.
Being able to run a site using less memory on average, and much less memory in the worst case (traffic spike for a site running a dynamic language module like mod_php), is a major reason to switch to Nginx from Apache.
Apache 2.4 has reduced memory usage, are you sure this still applies? Nobody is forcing you to use mod_php, in fact I'd say it's obsolete. I use PHP-FPM.
You probably don't need to. Apache is good enough and if you have to sell your boss on using Nginx then you probably don't have enough need to switch. It should sell itself or you are probably just wasting your time.
I use Nginx because I find it easier to configure and more lightweight out of the box.
It's not so much switching to something qualitatively better than to something with a somewhat different style.
For example, Apache is chock full of modules with a certain richness to them, they do everything and everything you could possibly think of, with a verbose configuration that usually comes filled with "defaults" by the maintainer, and plenty of options to choose between processes and threads; Nginx is "lean and mean", strictly event-based and minimalistic in almost every way, is usually compiled to include just what you need (and still feels un-monolithic despite modules being statically compiled), comes with sensible defaults and is usually bundled with almost no config.
Some rough analogues: Nginx is to Apache what (Varnish, Postfix, Go, Git) is to (Squid, Exim, C++, Subversion).
I can accept Git >> Subversion and Varnish >> Squid but I'll take you on with sticks and pillows over Postfix >> Exim because that's entirely nonsense, sir, and I WILL NOT HAVE IT.
It's not ">" as much as "<->", ie. not a statement about quality. I know a lot of people love Exim.
The reason I included Exim is that it's a huge monolithic chunk (at least as I remember it, it's been a while) with a very long history (and lots of legacy quirks), whereas Postfix is designed from the beginning to be modern and modular.