Yes, I was planning a similar experiment with UCall (https://github.com/unum-cloud/ucall), leveraging the NUMA functionality introduced in v2 of Fork Union. I don’t currently have the right hardware to test it properly, but it would be very interesting to measure how pinning behaves on machines with multiple NUMA nodes, NICs, and a balanced PCIe topology.
That's outrageous.. and I don't agree with your assessment, because smol is in the same niche as Tokio (that is, an async execuutor, which isn't necessarily optimizing for CPU-bound workloads) and isn't nearly as slow.
I think performance is a very critical property for Rust infrastructure. One can only hope that newer Tokio versions could address overheads which make everyone slower than necessary.