[PATCH 00/10] vhost-user: Preparatory series for multiple iovec entries per virtqueue element
Currently, the vhost-user path assumes each virtqueue element contains exactly one iovec entry covering the entire frame. This assumption breaks as some virtio-net drivers (notably iPXE) provide descriptors where the vnet header and the frame payload are in separate buffers, resulting in two iovec entries per virtqueue element. This series refactors the vhost-user data path so that frame lengths, header sizes, and padding are tracked and passed explicitly rather than being derived from iovec sizes. This decoupling is a prerequisite for correctly handling padding of multi-buffer frames. The changes in this series can be split in 3 groups: - New iov helpers (patches 1-2): iov_memset() and iov_memcopy() operate across iovec boundaries. These are needed by the final patch to pad and copy frame data when a frame spans multiple iovec entries. - Structural refactoring (patches 3-5): Move vnethdr setup into vu_flush(), separate virtqueue management from socket I/O in the UDP path, and pass iov arrays explicitly instead of using file-scoped state. These changes make it possible to pass explicit frame lengths through the stack, which is required to pad frames independently of iovec layout. - Explicit length passing throughout the stack (patches 6-10): Thread explicit L4, L2, frame, and data lengths through checksum, pcap, vu_flush(), and tcp_fill_headers(), replacing lengths that were previously derived from iovec sizes. With lengths tracked explicitly, the final patch can centralise Ethernet frame padding into vu_collect() and a new vu_pad() helper that correctly pads frames spanning multiple iovec entries. Laurent Vivier (10): iov: Introduce iov_memset() iov: Add iov_memcopy() to copy data between iovec arrays vu_common: Move vnethdr setup into vu_flush() udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller udp_vu: Pass iov explicitly to helpers instead of using file-scoped array checksum: Pass explicit L4 length to checksum functions pcap: Pass explicit L2 length to pcap_iov() vu_common: Pass explicit frame length to vu_flush() tcp: Pass explicit data length to tcp_fill_headers() vhost-user: Centralise Ethernet frame padding in vu_collect() and vu_pad() checksum.c | 35 ++++++----- checksum.h | 6 +- iov.c | 78 ++++++++++++++++++++++++ iov.h | 5 ++ pcap.c | 37 ++++++++--- pcap.h | 2 +- tap.c | 6 +- tcp.c | 14 +++-- tcp_buf.c | 3 +- tcp_internal.h | 2 +- tcp_vu.c | 62 +++++++++---------- udp.c | 5 +- udp_vu.c | 162 ++++++++++++++++++++++++++----------------------- vu_common.c | 58 ++++++++++-------- vu_common.h | 5 +- 15 files changed, 304 insertions(+), 176 deletions(-) -- 2.53.0
Add a helper to set a range of bytes across an IO vector to a given
value, similar to memset() but operating over scatter-gather buffers.
It skips to the given offset and fills across iovec entries up to the
requested length.
Signed-off-by: Laurent Vivier
Add a helper to copy data from a source iovec array to a destination
iovec array, each starting at an arbitrary byte offset, iterating
through both arrays simultaneously and copying in chunks matching the
smaller of the two current segments.
Signed-off-by: Laurent Vivier
Every caller of vu_flush() was calling vu_set_vnethdr() beforehand with
the same pattern. Move it into vu_flush().
Remove vu_queue_notify() from vu_flush() and let callers invoke it
explicitly. This allows paths that perform multiple flushes, such as
tcp_vu_send_flag() and tcp_vu_data_from_sock(), to issue a single guest
notification at the end.
Signed-off-by: Laurent Vivier
udp_vu_sock_recv() currently mixes two concerns: receiving data from the
socket and managing virtqueue buffers (collecting, rewinding, releasing).
This makes the function harder to reason about and couples socket I/O
with virtqueue state.
Move all virtqueue operations, vu_collect(), vu_init_elem(),
vu_queue_rewind(), vu_set_vnethdr(), and the queue-readiness check, into
udp_vu_sock_to_tap(), which is the only caller. This turns
udp_vu_sock_recv() into a pure socket receive function that simply reads
into the provided iov array and adjusts its length.
Signed-off-by: Laurent Vivier
udp_vu_sock_recv(), udp_vu_prepare(), and udp_vu_csum() all operated on
the file-scoped iov_vu[] array directly. Pass iov and count as explicit
parameters instead, and move iov_vu[] and elem[] to function-local
statics in udp_vu_sock_to_tap(), the only function that needs them.
Signed-off-by: Laurent Vivier
The iov_tail passed to csum_iov_tail() may contain padding or trailing
data beyond the actual L4 payload. Rather than relying on
iov_tail_size() to determine how many bytes to checksum, pass the
length explicitly so that only the relevant payload bytes are included
in the checksum computation.
Signed-off-by: Laurent Vivier
With vhost-user multibuffer frames, the iov can be larger than the
actual L2 frame. The previous approach of computing L2 length as
iov_size() - offset would overcount and write extra bytes into the
pcap file.
Pass the L2 frame length explicitly to pcap_frame() and pcap_iov(),
and write exactly that many bytes instead of the full iov remainder.
Signed-off-by: Laurent Vivier
Currently vu_flush() derives the frame size from the iov, but in
preparation for iov arrays that may be larger than the actual frame,
pass the total length (including vnet header) explicitly so that only
the relevant portion is reported to the virtqueue.
Ensure a minimum frame size of ETH_ZLEN + VNET_HLEN to handle short
frames. All elements are still flushed to avoid descriptor leaks,
but trailing elements beyond frame_len will report a zero length.
Signed-off-by: Laurent Vivier
tcp_fill_headers() computed the TCP payload length from iov_tail_size(),
but with vhost-user multibuffer frames, the iov_tail will be larger than
the actual data. Pass the data length explicitly so that IP total
length, pseudo-header, and checksum computations use the correct value.
Signed-off-by: Laurent Vivier
The previous per-protocol padding done by vu_pad() in tcp_vu.c and
udp_vu.c was only correct for single-buffer frames: it assumed the
padding area always fell within the first iov, writing past its end
with a plain memset().
It also required each caller to compute MAX(..., ETH_ZLEN + VNET_HLEN)
for vu_collect() and to call vu_pad() at the right point, duplicating
the minimum-size logic across protocols.
Move the Ethernet minimum size enforcement into vu_collect() itself, so
that enough buffer space is always reserved for padding regardless of
the requested frame size.
Rewrite vu_pad() to take a full iovec array and use iov_memset(),
making it safe for multi-buffer (mergeable rx buffer) frames.
In tcp_vu_sock_recv(), replace iov_truncate() with iov_skip_bytes():
now that all consumers receive explicit data lengths, truncating the
iovecs is no longer needed. In tcp_vu_data_from_sock(), cap each
frame's data length against the remaining bytes actually received from
the socket, so that the last partial frame gets correct headers and
sequence number advancement.
Signed-off-by: Laurent Vivier
participants (1)
-
Laurent Vivier