How come they don't have integration testing? Full 100 services should have end-...

TheDong · on Oct 2, 2020

You can comment this on any outage that ever happens. "Why didn't they have a test for that?"

The answer is that tests are never perfect. If you want to create an integration environment that mimics prod, you have to fork an entire parallel universe into your integ environment to run the test. Anything else will diverge from the reality of the future.

Even if every vendor's service or hardware had integration tests, that doesn't mean that the integration tests covered every case. It doesn't mean there's not an emergent property of two systems behaving in a slightly unexpected way that turns into a catastrophic result.

It's not necessarily even possible to have two copies of some of the systems; who knows how expensive a given vendor's hardware box is.

It's definitely not possible to exactly mimic future traffic. Perhaps in the test environment it works, then the prod environment, requests are different so it fails.

Hardware errors happen, and integ testing those is difficult to say the least.

astrobe_ · on Oct 2, 2020

Long story short, tests don't prove there's no bug, they can only show there is a bug.

Aeolun · on Oct 3, 2020

> It's not necessarily even possible to have two copies of some of the systems; who knows how expensive a given vendor's hardware box is.

Given you are running a stock exchange, chances are you have enough money for at least one copy.

toast0 · on Oct 3, 2020

Certainly there's money for one copy, that's what runs production.

wolrah · on Oct 3, 2020

Relevant tweet: https://twitter.com/stahnma/status/634849376343429120

"Everybody has a testing environment. Some people are lucky enough enough to have a totally separate environment to run production in."

adrianN · on Oct 3, 2020

I think you're underestimating the exponential blowup of testing you need when you add more components.