How is that better than a simple monitor/alert for low disk space? That low disk space is likely caused by having an application store too much cumulative data in log files or temporary caches etc. and often easy enough to fix. And many applications out there simply don't need the level of scalability and extra-robustness you need that you can still expect decent levels of service in the immediate aftermath of having one node go down. Certainly from my experience it's less work (and cost) to put measures in place to minimise the chances of a fatal crash than it is to ensure the whole environment functions smoothly even if parts of it do crash regularly. I'd also note we can be grateful that the developers of OSes, web servers, VMs and database servers don't subscribe to "let it crash"!