Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I used to look after a bunch of Sensu clusters (a system that lets you define monitoring checks on clients and send keepalives/results/metrics to the server) and it used Redis to store state and RabbitMQ to handle messaging. The clients would put messages directly onto the server queues, and the server(s) would proceed the queues watching for missed keepalives, failures in check results etc. It was amazing until I decided to cluster it. It would work fine for weeks, even months. Then, for reasons unknown (possibly bad hardware, noisy neighbor, reboot) one of the nodes would drop out, some fool would "fix it" without understanding how it worked and we'd get a split brain and end up with two Sensu queue systems, but only one is getting keepalives. So if there is a Sensu server instance pointing at the other one, it alerts on every single client (hundreds) because "their keepalives have expired!". That throws hundreds of additional messages per minute onto the alert queue in that part of the fracture. Since we were practicing DevOps, all teams were responsible for their own production assets, so this would end up with a P1 incident and 20 people on a call (maybe at 4am) thinking that there's a major platform outage. And it's a nightmare to fix because you essentially have to ignore your alerting system (and it's swamped in bullshit messages anyway). I ended up working a script that replaced the alert handlers with a dev null, nuked RabbitMQ, recreated the cluster and waited for the clients to find it (or run another script to bounce all of them). Once everything settles, enable the alert handlers.

I switched to Sensu's experimental Redis queue config and never had that issue again. Ended up running 3x small Sensu servers in each region, each running Sensu + Redis in HA mode. Bulletproof, if properly configured.

Maybe clustering has improved, but having to use it as part of Sensu cluster put me off it.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: