Interesting to see this. It sounds like they're not on AWS, given that they mentioned that having 1000 instances for their production environment made them one of the bigger deployments on their hosting provider.
If not for the troubles they experienced with their hosting provider and managing deployments / cutting over traffic, it possibly could have been the cheaper option to just keep horizontally scaling vs putting in the time to investigate these issues. I'd also love to see some actual latency graphs, what's the P90 like at 25% CPU usage with a simple Gunicorn / gevent setup?
I was wondering that too, but there aren't that many common cloud provider that has 96 vCPU offering.
I am also wondering on 144 Workers, on 96 vCPU which is not 96 CPU Core but 96 CPU thread. So effectively 144 Workers on 48 CPU Core possibly running at sub 3Ghz Clock Speed. But it seems they got it to work out in the end. ( May be at the expense of latency )
Assuming you're running a system where normal request/response handling blocks on database queries it's often optimal to have more workers than available cpu threads and 1.5x is a common rule of thumb to try first.
If not for the troubles they experienced with their hosting provider and managing deployments / cutting over traffic, it possibly could have been the cheaper option to just keep horizontally scaling vs putting in the time to investigate these issues. I'd also love to see some actual latency graphs, what's the P90 like at 25% CPU usage with a simple Gunicorn / gevent setup?