Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Pretty much all load balancers have health checks - active where they reach out to each server, or passive where they observe the responses of existing requests if they can.

One of the issues is making your active health check more like a doctor's physical than "'tis but a scratch" self-reporting. But also ensuring you're not dealing with a whole bunch of hypochondriacs.

Passive health checks at least have the property that they fail servers when the servers are unable to serve, even if the active health check does not consider some subsystem in its response. But alone they can easily be fooled by really fast non-error responses.

Anyway, saying "name of brand of load balancers" solves this problem is only covering the most basic cases. General solutions are at best only the first step of the full solution. You need to think about the edges - which I suspect is what Rachel is advocating.



Cloudwatch does what you're referring to as well. It's more of a basic server monitoring system that happens to integrate with the load balancer.

You get a set of basic VM level metrics, and you can feed it custom metrics from your app, or log files. All of which can be configured to alarm. I don't think it's possible to run advanced statistics on the metrics for alarming (eg, standard deviation from 30 minute exceeds N), but it may be. Usually it's just an event count, like more than N 500 errors over X time.

I do agree you need to think deeper than basic health checks though, 'broken server' is always a hard boolean to nail down.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: