I work for a dark pool ATS that is hit by HFT firms, and we routinely see flows greater than 12k transactions per minute. Ive been benchmarking a variety of compilers, db libs, drivers and platforms. So far, best perfomance ive gotten, single threaded, is I can write a single order to a man store table in MSQL in about 500 microsecs (that was from a .net core app running directly on the same server as MS SQL, ive been able to get comparable performance from a C++ app running on Linux with kernel bypass network IO). Mind, ive not tried to optimize the DB at all, this is purely comparing DB APIs. Worst Ive seen, all other things being equal is about 800 micros.
Can't share the code or schema, but can give a rough approximation of the setup.
We're experimenting with MSSQL's memory optimized tabled and native compiled stored procedures. My timings today, I was getting one call to our stored proc in the 300-400us range, that was inserting one record each into 2 tables.
Test setup for all of my scenarios are do all of the same DB ops, I'm alternating which libs I'm using. Best performance so far ive been able to get from linux talking to MSSQL has been using OTL on top of unixodbc with MS Driver 17. Mind these are physical servers sitting a few feet from a shared router.