Noob question, but don't this potentially add per-packet latency and processing variance?
My thinking is that one needs to have a set of packets before being able to start processing. The first packets arriving must wait until enough packets has arrived to fill min size of the vector. And if the last packet comes "late", the arrival time of the last packet adds to the time for the other packets, thus adding something that looks like variance.
I assume there are parameters setting min number of packets in a vector, and timeouts for when to accept packets into a given vector.
This definitely add per-packet latency and processing, that's why the recommended way to run VPP with high performance is to use some kind of network accelerator library, the same way the VPP based solutions like TNSR by Netgate will work best with DPDK for kernel bypass, or use similar technique like XDP with eBPF but XDP does not bypass the kernel hence eBPF is needed.
For Linux user space solutions without kernel bypass if the above mentioned network accelerators are not installed (devices in customer's premise, etc), the recommended way is to use Netmap since it enabled direct access to the network interface card (NIC) buffers from user space otherwise you are at the mercy of Linux own notorious sk_buff [1].
Another alternative perhaps for more efficient buffering in Linux is to use PF_ring or the the-kid-on-block IO_ring but not sure they are being currently being utilized in VPP or not.
For good introduction on Linux Networking acceleration technology, this presentation is a good start [2].
The standard solution to this is to trigger the batch process when N packets are queued OR M amount of time passes. As long as you set M to below your latency threshold, you should be good. If you don't want your CPU to burn up cycles polling a usually empty queue, you can add some logic to switch between polling-based and interrupt-based rx depending on throughput. The Linux networking stack already does this for drivers that support NAPI, and I'm sure that DPDK has an equivalent.
... And doesn't this also adds, creates a relationship between otherwise independent packets, potentially creating a way to tag packets through a network. Basically if I can control the arrival time of my packets to a router (I send them at a baseline fixed rate, but delay the transmit time with a pattern), packets that are then bunched together to be vector processed in the router will also be affected by this delay. I could possibly then observe this pattern at other places in the network. Thus tracing packets.
It is a latency/throughput tradeoff. I haven't really see how VPP works, but I don't expect it actually waits for a vector of packets to be completed. Likely it buffers batch 2 while it is still processing batch 1. As soon as it is done, it starts processing batch 2 and begins buffering batch 3.
I wouldn't be surprised if this actually reduces variance.
My thinking is that one needs to have a set of packets before being able to start processing. The first packets arriving must wait until enough packets has arrived to fill min size of the vector. And if the last packet comes "late", the arrival time of the last packet adds to the time for the other packets, thus adding something that looks like variance.
I assume there are parameters setting min number of packets in a vector, and timeouts for when to accept packets into a given vector.