Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ideally incident handling should "just" be rolling back the broken change. Fixing the problem should be done in the morning with no time pressure, not in the middle of the night half asleep with customers on the other side of the world yelling at you. Of course it's not always that simple, but most of the time that's what on call should be about


It would be nice if things only broke during "business" hours and didn't have real world impact. Nevermind impact millions of people around the world. But if you look at the customers of say code that is running cloud infrastructure it is running airlines reservations/checkins, government workloads, banks, hospitals, critical infrastructure, netflix, gaming services. That's a lot of things that can't typically wait for morning.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: