Ex Amazon SDE here. Amazon uses a service based architecture where one team (e.g. 4 to 8 engineers) maintains one service.
The main benefit is organizational and lies in having a small and very measurable interface between services.
Crucially, the team the develops the service runs it and provides 24/7 on-call for it. This also include taking decisions on how much to spend on hardware VS optimization as the team is responsible for the overall performance.
More importantly, the team is now responsible for service outages.
This is very different from having a team write many microservices. It's especially bad when a team writes a bunch of microservices (the so-called distributed monolith) and somebody else has to run them.
Sadly the article does not focus on this aspect.
As a side note, once developers realize they get paged out of bed by their own code... it's amusing to see how the cool framework of the month is not cool anymore.
At my previous company this was introduced as well, but the incentives were very warped. There was no advantage to actually being all that stable, since then you missed out on "firefighting glory". I never saw praise or promotions for maintaining high uptime. I'd be very interested to learn if Amazon managed to overcome this and if so, how.
I cannot talk for the whole company but I don't think I witnessed a culture of "firefighting glory", rather the opposite.
There is a formal process to investigate and correct errors after a non-trivial incident that requires collecting evidence and then discuss what happen, when, why and who did what with the whole team.
Then people ask the question "why was this not prevented or forseen?" and keep going backward until you exit the technical realm and look into people's choices.
The root causes could be, for example, that it was a management decision to prioritize something else over availability concerns... and the engineers are off the hook.
Sometimes one engineer was overworked and tired and did an error in good faith that is too difficult to prevent and avoid, and that's also acceptable.
Sometimes that the whole team ignored an availability issue in the product architecture, and that's bad.
And so on... Needless to say the outcome can impact people's performance review.
The blog post is very long and predicted at 18 minutes of reading, yet the word "deploy" appears only once. Plus there is no mention on who has to run it.
The main benefit is organizational and lies in having a small and very measurable interface between services.
Crucially, the team the develops the service runs it and provides 24/7 on-call for it. This also include taking decisions on how much to spend on hardware VS optimization as the team is responsible for the overall performance.
More importantly, the team is now responsible for service outages.
This is very different from having a team write many microservices. It's especially bad when a team writes a bunch of microservices (the so-called distributed monolith) and somebody else has to run them.
Sadly the article does not focus on this aspect.
As a side note, once developers realize they get paged out of bed by their own code... it's amusing to see how the cool framework of the month is not cool anymore.