How could you possibly say this is doing it wrong? The only way you could batch requests in the way you describe is if you have 1 (or very small number) compute nodes. You would need all those requests to hit same node so you could try and batch. With serverless compute infrastructure (which is what this blog is demonstrating by using lambda) you can have 1 isolated process per request and therefore need a database that can actually handle this kind of load.
Here is your problem. You are trying to build a huge application using inadequate technical building blocks.
Lambdas are super inefficient in many different ways. It is a good tool but as with every tool you do need to know how to use it. If you try to build heavy compute app in Python and then complain at your electricity bill -- that really is on you.
If your database is overloaded with hundreds of thousands of connections from your lambdas, it means it is end of the road for your lambdas. Do not put effort into scaling your database up, put effort into reducing the number of your connections and efficiency of your application.
I think you can start to hit connection limit walls with RDS at several hundred connections, depending on your instance size. Running an even moderately busy app you could hit those pretty quickly. I would hate to have to change my entire infrastructure at such an early stage because the DB was hitting connection limits!
Would you ever need a million open connections? Probably not! But you'll likely want more than 500 at some point. And if your entire stack is serverless already, it'd be nice if the DB could handle that relatively low number of connections too.
I look at the database connections the following way: how many connections can a database really serve effectively? For a connection to be actively served the database really needs to have a cpu core working on it or waiting for IO from the storage. And I am completely omitting the fact that databases really need a sizeable amount of memory to be able to do things efficiently.
Even if you have a server with hundreds of cores your database probably can't be actively working on more than a small multiple of the number of the cores.
I am not saying you can't. I totally believe you do.
Modern hardware is totally able to execute hundreds of thousands of transactions per second on a single core. If your query is simple and you can organise getting the data from storage at the necessary speeds you should totally be able to do this many requests, possibly even tens of millions.
But handling one million queries per second is completely different from having database server making progress on one million queries in parallel. What happens is, the database server is only making progress on a small number of them (typically in tens up to hundreds on a very beefy hardware) and everything else is just queued up.
There are much, much better ways to queue up millions of things than opening a million connections to get each one done individually.
Lambda was a means to an end for us here, and we're not specifically endorsing its use in _this_ way. Our goal was explicitly to test our ability to handle many parallel connections, and to observe what that looked like from different angles.
We're a DBaaS company, and we do need to be prepared for anything users may throw at us. Our Global Routing infrastructure has seen some major upgrades/changes recently to help support new features like PlanetScale Connect and our serverless drivers.
From our point of view, this was a sizing exercise with the interesting side benefit that many people do happen to use Serverless Functions similarly.
How much is a moderately busy app? I have a sketch of a twitter app in Scala with zio-http as the framework, doing the batching strategy twawaaay describes, and it can handle 46k POSTs per second on my i5-6600 with a SATA3 SSD. That's using 16 connections to postgres, which is probably more connections than is reasonable for my 4 core CPU.
At 46k RPS, it only takes 5.5 ms to assemble a batch of 256, so latency is basically unaffected by doing this. Just set a limit of 5-10 ms to assemble the batch (or lower if you have a more powerful computer that can handle more throughput).