Full 100 services should have end-to-end integration testing and any change made to that chain of tooling should have to run through a massive integration test. If anything fails, the change is no longer acceptable.
You can comment this on any outage that ever happens. "Why didn't they have a test for that?"
The answer is that tests are never perfect. If you want to create an integration environment that mimics prod, you have to fork an entire parallel universe into your integ environment to run the test. Anything else will diverge from the reality of the future.
Even if every vendor's service or hardware had integration tests, that doesn't mean that the integration tests covered every case. It doesn't mean there's not an emergent property of two systems behaving in a slightly unexpected way that turns into a catastrophic result.
It's not necessarily even possible to have two copies of some of the systems; who knows how expensive a given vendor's hardware box is.
It's definitely not possible to exactly mimic future traffic. Perhaps in the test environment it works, then the prod environment, requests are different so it fails.
Hardware errors happen, and integ testing those is difficult to say the least.
Full 100 services should have end-to-end integration testing and any change made to that chain of tooling should have to run through a massive integration test. If anything fails, the change is no longer acceptable.