It's funny how I encountered a problem which went exactly the opposite way! We initially introduced a rate limiter that was adequate for the time, but with the product scaling up it stopped being adequate, and any failures with 429 were either ignored, or closed as client bugs. Only after some time we realized that the rate of requests scaled up approximately with the rate of product growth, and a quick fix was to simply remove the limiter, but after a couple of times when DB decided to take a nap after being overwhelmed, we added a caching layer.
Just goes to show that there is no silver bullet - context, experience and good amount of gut feeling is paramount.
Something that was drilled into me early in my career was that you cannot expect your cache to be up 100% of the time. The logical extension of that is your main DB needs to be able to handle 100% of your traffic at a moment’s notice. Not only has this kind of thinking saved my ass on several occasions, but it’s also actually kept my code much cleaner. I don’t want to say rate limiters and circuit breakers are the mark of bad engineering, butttt they’re usually just good engineering deferred.
Reminds me of gas plumbing, the indoor lines are only a few psi above ambient, but the lines themselves have to take line pressure to 300psi is case the regulator fails. It's good advice!
Just goes to show that there is no silver bullet - context, experience and good amount of gut feeling is paramount.