On Wed, 20 May 2026 18:18:52 +0200
Stefano Brivio
On Wed, 20 May 2026 18:07:08 +0200 Stefano Brivio
wrote: On Wed, 20 May 2026 17:34:45 +0200 Stefano Brivio
wrote: On Wed, 13 May 2026 13:52:08 +0200 Laurent Vivier
wrote: Currently, the vhost-user path assumes each virtqueue element contains exactly one iovec entry covering the entire frame. This assumption breaks as some virtio-net drivers (notably iPXE) provide descriptors where the vnet header and the frame payload are in separate buffers, resulting in two iovec entries per virtqueue element.
This series refactors the vhost-user data path so that frame lengths, header sizes, and padding are tracked and passed explicitly rather than being derived from iovec sizes. This decoupling is a prerequisite for correctly handling padding of multi-buffer frames.
Sorry to bring (likely) bad news, but this series seems to introduce a regression: I got the migration/rampstream_in tests fail twice in a row, which I've never saw happening (I think I saw a single failure a long time ago when the machine had a high CPU load, but nothing else).
I'm currently bisecting and the bisect seems to point towards the end of the series (probably 10/10), but I haven't finished yet. I'll keep you posted. I haven't spotted anything that might cause issues there.
Yeah, that's the one :(
$ git bisect bad db798fc60f4c5869cb53168354e068fb4dabd91a is the first bad commit commit db798fc60f4c5869cb53168354e068fb4dabd91a Author: Laurent Vivier
Date: Wed May 13 13:52:18 2026 +0200 vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad()
The "TCP/IPv4: sequence check, ramps, inbound" test in rampstream_in gets stuck, once the source is done with the migration, and passt on the destination just printed:
Accepted TCP_REPAIR helper, PID 13 accepted connection from PID 16
I'll get captures and logs next. It seems to fail most of the times, I had two failures in a row again.
Log from passt --debug attached. Likely highlight:
--- 13.2853: ================ Vhost user message ================ 13.2853: Request: VHOST_USER_SET_VRING_ADDR (9) 13.2853: Flags: 0x1 13.2853: Size: 40 13.2853: vhost_vring_addr: 13.2853: index: 0 13.2853: flags: 0 13.2853: desc_user_addr: 0x00007f0943f41000 13.2853: used_user_addr: 0x00007f0943f42240 13.2854: avail_user_addr: 0x00007f0943f42000 13.2854: log_guest_addr: 0x000000001ff43240 13.2854: Setting virtq addresses: 13.2854: vring_desc at 0x7f2e2e2ca000 13.2854: vring_used at 0x7f2e2e2cb240 13.2854: vring_avail at 0x7f2e2e2cb000 13.2854: Last avail index != used index: 2163 != 1936 13.2854: Got packet, but RX virtqueue not usable yet ---
pcap file of that passt instance empty, it didn't have a chance to send/receive packets yet.
...but I bisected 10/10 itself, and realised that reverting the iov_truncate() -> iov_skip_bytes() conversion in tcp_vu_sock_recv() like this: --- diff --git a/tcp_vu.c b/tcp_vu.c index f6ac76e..ccc031e 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -249,11 +249,7 @@ static ssize_t tcp_vu_sock_recv(const struct ctx *c, struct vu_virtq *vq, if (!peek_offset_cap) ret -= already_sent; - i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used, - MAX(hdrlen + ret, VNET_HLEN + ETH_ZLEN), - NULL); - if ((size_t)i < iov_used) - i++; + i = iov_truncate(&iov_vu[DISCARD_IOV_NUM], iov_used, ret); /* adjust head count */ while (*head_cnt > 0 && head[*head_cnt - 1] >= i) --- hides / fixes the issue. I'm testing things on a kernel without SO_PEEK_OFF support for TCP, but it doesn't seem to matter ('ret' at this point is the same before and after your patch). I don't see what's wrong with your change though. It's not even about replacing 'ret' with the padded version, because I can also reproduce the issue with: i = iov_skip_bytes(&iov_vu[DISCARD_IOV_NUM], iov_used, ret, NULL); For convenience, this is how I'm selecting the test without bothering about variables in run(): --- diff --git a/test/run b/test/run index f858e55..25d7002 100755 --- a/test/run +++ b/test/run @@ -71,6 +71,7 @@ run() { perf_init [ ${CI} -eq 1 ] && video_start ci +dont() { exeter smoke/smoke.sh exeter build/build.py exeter build/static_checkers.sh @@ -162,6 +163,10 @@ run() { setup migrate test migrate/iperf3_many_out6 teardown migrate +} + VHOST_USER=1 + VALGRIND=0 + setup migrate test migrate/rampstream_in teardown migrate --- -- Stefano