Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

VMS clusters on VAXen were doing fail-overs perfectly in the 80's. All kinds of products and software (even FOSS) do it today. You're telling me that, in 2015, you are doing manual failovers despite tons of free tools to automate it reliably?

Better to do an assessment of each thing that can fail, how to isolate/detect it, how to recover from it, how to implement that with available tools, and implement it. Test it in a number of situations on same hardware, network, and apps you'll use in production. Once it's solid, put them into production. Then, never worry about that stuff again past monitoring and maintenance.

Btw, Netflix employs Monkeys to do this. Open-sources their tools with blog writeups on their use, too. I'm sure you Humans will be able to handle it. ;)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: