Noob question, but don't this potentially add per-packet latency and processing ...

teleforce · on Nov 22, 2023

This definitely add per-packet latency and processing, that's why the recommended way to run VPP with high performance is to use some kind of network accelerator library, the same way the VPP based solutions like TNSR by Netgate will work best with DPDK for kernel bypass, or use similar technique like XDP with eBPF but XDP does not bypass the kernel hence eBPF is needed.

For Linux user space solutions without kernel bypass if the above mentioned network accelerators are not installed (devices in customer's premise, etc), the recommended way is to use Netmap since it enabled direct access to the network interface card (NIC) buffers from user space otherwise you are at the mercy of Linux own notorious sk_buff [1].

Another alternative perhaps for more efficient buffering in Linux is to use PF_ring or the the-kid-on-block IO_ring but not sure they are being currently being utilized in VPP or not.

For good introduction on Linux Networking acceleration technology, this presentation is a good start [2].

[1] VPP docs: Create netmap:

https://docs.fd.io/vpp/17.04/clicmd_src_vnet_devices_netmap....

[2] Linux Networking: The meaning of acronyms eBPF, DPDK, XDP, VPP [video]:

https://news.ycombinator.com/item?id=38376380

10000truths · on Nov 22, 2023

The standard solution to this is to trigger the batch process when N packets are queued OR M amount of time passes. As long as you set M to below your latency threshold, you should be good. If you don't want your CPU to burn up cycles polling a usually empty queue, you can add some logic to switch between polling-based and interrupt-based rx depending on throughput. The Linux networking stack already does this for drivers that support NAPI, and I'm sure that DPDK has an equivalent.

JoachimS · on Nov 22, 2023

(Answering myself now, oh my.)

... And doesn't this also adds, creates a relationship between otherwise independent packets, potentially creating a way to tag packets through a network. Basically if I can control the arrival time of my packets to a router (I send them at a baseline fixed rate, but delay the transmit time with a pattern), packets that are then bunched together to be vector processed in the router will also be affected by this delay. I could possibly then observe this pattern at other places in the network. Thus tracing packets.

Possibly.

gpderetta · on Nov 22, 2023

It is a latency/throughput tradeoff. I haven't really see how VPP works, but I don't expect it actually waits for a vector of packets to be completed. Likely it buffers batch 2 while it is still processing batch 1. As soon as it is done, it starts processing batch 2 and begins buffering batch 3.

I wouldn't be surprised if this actually reduces variance.