Pretty much all load balancers have health checks - active where they reach out ...

klaruz · on Feb 17, 2015

Cloudwatch does what you're referring to as well. It's more of a basic server monitoring system that happens to integrate with the load balancer.

You get a set of basic VM level metrics, and you can feed it custom metrics from your app, or log files. All of which can be configured to alarm. I don't think it's possible to run advanced statistics on the metrics for alarming (eg, standard deviation from 30 minute exceeds N), but it may be. Usually it's just an event count, like more than N 500 errors over X time.

I do agree you need to think deeper than basic health checks though, 'broken server' is always a hard boolean to nail down.