Ex Amazon SDE here. Amazon uses a service based architecture where one team (e.g...

WJW · on July 7, 2020

At my previous company this was introduced as well, but the incentives were very warped. There was no advantage to actually being all that stable, since then you missed out on "firefighting glory". I never saw praise or promotions for maintaining high uptime. I'd be very interested to learn if Amazon managed to overcome this and if so, how.

ex_amazon_sde · on July 7, 2020

I cannot talk for the whole company but I don't think I witnessed a culture of "firefighting glory", rather the opposite.

There is a formal process to investigate and correct errors after a non-trivial incident that requires collecting evidence and then discuss what happen, when, why and who did what with the whole team.

Then people ask the question "why was this not prevented or forseen?" and keep going backward until you exit the technical realm and look into people's choices.

The root causes could be, for example, that it was a management decision to prioritize something else over availability concerns... and the engineers are off the hook.

Sometimes one engineer was overworked and tired and did an error in good faith that is too difficult to prevent and avoid, and that's also acceptable.

Sometimes that the whole team ignored an availability issue in the product architecture, and that's bad.

And so on... Needless to say the outcome can impact people's performance review.

fulafel · on July 7, 2020

Also AWS services are much more standalone products than an components of an inhouse/LOB system consisting of custom built microservices.

ex_amazon_sde · on July 7, 2020

No, the large majority of services in the SOA are unrelated to customer-facing AWS services and they are inhouse/LOB/custom.

pragmaticdev · on July 7, 2020

It's difficult to put all of the nuances in a blog post. As a developer, I'm a firm believer in "run what you brung".

ex_amazon_sde · on July 7, 2020

The blog post is very long and predicted at 18 minutes of reading, yet the word "deploy" appears only once. Plus there is no mention on who has to run it.

gbrits · on July 7, 2020

Comment as bookmark