[PATCH v3 0/8] vhost-user,udp: Handle multiple iovec entries per virtqueue element
Some virtio-net drivers (notably iPXE) provide descriptors where the vnet header and the frame payload are in separate buffers, resulting in two iovec entries per virtqueue element. Currently, the RX (host to guest) path assumes a single iovec per element, which triggers: ASSERTION FAILED in virtqueue_map_desc (virtio.c:403): num_sg < max_num_sg This series reworks the UDP vhost-user receive path to support multiple iovec entries per element, fixing the iPXE crash. This series only addresses the UDP path. TCP vhost-user will be updated to use multi-iov elements in a subsequent series. v3: - include the series "Decouple iovec management from virtqueues elements" - because of this series, drop: "vu_common: Accept explicit iovec counts in vu_set_element()" "vu_common: Accept explicit iovec count per element in vu_init_elem()" "vu_common: Prepare to use multibuffer with guest RX" "vhost-user,udp: Use 2 iovec entries per element" - drop "vu_common: Pass iov_tail to vu_set_vnethdr()" as the specs insures a buffer is big enough to contain vnet header - introduce "with_header()" and merge "udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()" and "udp_vu: Use iov_tail in udp_vu_prepare()" to use it v2: - add iov_truncate(), iov_memset() - remove iov_tail_truncate() and iov_tail_zero_end() - manage 802.3 minimum frame size Laurent Vivier (8): virtio: Pass iovec arrays as separate parameters to vu_queue_pop() vu_handle_tx: Pass actual remaining out_sg capacity to vu_queue_pop() vu_common: Move iovec management into vu_collect() vhost-user: Centralise Ethernet frame padding in vu_collect(), vu_pad() and vu_flush() udp_vu: Use iov_tail to manage virtqueue buffers udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller iov: Add IOV_PUT_HEADER() and with_header() to write header data back to iov_tail udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() iov.c | 47 +++++++++++ iov.h | 27 ++++++- tcp_vu.c | 46 +++++------ udp.c | 129 ++++++++++++++++-------------- udp_internal.h | 6 +- udp_vu.c | 207 +++++++++++++++++++++++++------------------------ virtio.c | 29 +++++-- virtio.h | 4 +- vu_common.c | 149 ++++++++++++++++++++--------------- vu_common.h | 24 +----- 10 files changed, 385 insertions(+), 283 deletions(-) -- 2.53.0
Currently vu_queue_pop() and vu_queue_map_desc() read the iovec arrays
(in_sg/out_sg) and their sizes (in_num/out_num) from the vu_virtq_element
struct. This couples the iovec storage to the element, requiring callers
like vu_handle_tx() to pre-initialize the element fields before calling
vu_queue_pop().
Pass the iovec arrays and their maximum sizes as separate parameters
instead. vu_queue_map_desc() now writes the actual descriptor count
and iovec pointers back into the element after mapping, rather than
using the element as both input and output.
This decouples the iovec storage from the element, which is a
prerequisite for multi-buffer support where a single frame can span
multiple virtqueue elements sharing a common iovec pool.
No functional change.
Signed-off-by: Laurent Vivier
Previously, callers had to pre-initialize virtqueue elements with iovec
entries using vu_set_element() or vu_init_elem() before calling
vu_collect(). This meant each element owned a fixed, pre-assigned iovec
slot.
Move the iovec array into vu_collect() as explicit parameters (in_sg,
max_in_sg, and in_num), letting it pass the remaining iovec capacity
directly to vu_queue_pop(). A running current_iov counter tracks
consumed entries across elements, so multiple elements share a single
iovec pool. The optional in_num output parameter reports how many iovec
entries were consumed, allowing callers to track usage across multiple
vu_collect() calls.
This removes vu_set_element() and vu_init_elem() which are no longer
needed, and is a prerequisite for multi-buffer support where a single
virtqueue element can use more than one iovec entry. For now, callers
assert the current single-iovec-per-element invariant until they are
updated to handle multiple iovecs.
Signed-off-by: Laurent Vivier
In vu_handle_tx(), pass the actual remaining iovec capacity
(ARRAY_SIZE(out_sg) - out_sg_count) to vu_queue_pop() rather than a
fixed VU_MAX_TX_BUFFER_NB.
This enables dynamic allocation of iovec entries to each element rather
than reserving a fixed number of slots per descriptor.
Signed-off-by: Laurent Vivier
The per-protocol padding done by vu_pad() in tcp_vu.c and udp_vu.c was
only correct for single-buffer frames, and assumed the padding area always
fell within the first iov. It also relied on each caller computing the
right MAX(..., ETH_ZLEN + VNET_HLEN) size for vu_collect() and calling
vu_pad() at the right point.
Centralise padding logic into three shared vhost-user helpers instead:
- vu_collect() now ensures at least ETH_ZLEN + VNET_HLEN bytes of buffer
space are collected, so there is always room for a minimum-sized frame.
- vu_pad() replaces the old single-iov helper with a new implementation
that takes a full iovec array plus a 'skipped' byte count. It uses a
new iov_memset() helper in iov.c to zero-fill the padding area across
iovec boundaries, then calls iov_truncate() to set the logical frame
size.
- vu_flush() computes the actual frame length (accounting for
VIRTIO_NET_F_MRG_RXBUF multi-buffer frames) and passes the padded
length to vu_queue_fill().
Callers in tcp_vu.c, udp_vu.c and vu_send_single() now use the new
vu_pad() in place of the old pad-then-truncate sequences and the
MAX(..., ETH_ZLEN + VNET_HLEN) size calculations passed to vu_collect().
Centralising padding here will also ease the move to multi-iovec per
element support, since there will be a single place to update.
In vu_send_single(), fix padding, truncation and data copy to use the
requested frame size rather than the total available buffer space from
vu_collect(), which could be larger. Also add matching padding, truncation
and explicit size to vu_collect() for the DUP_ACK path in
tcp_vu_send_flag().
Signed-off-by: Laurent Vivier
Replace direct iovec pointer arithmetic in UDP vhost-user handling with
iov_tail operations.
udp_vu_sock_recv() now takes an iov/cnt pair instead of using the
file-scoped iov_vu array, and returns the data length rather than the
iov count. Internally it uses IOV_TAIL() to create a view past the
L2/L3/L4 headers, and iov_tail_clone() to build the recvmsg() iovec,
removing the manual pointer offset and restore pattern.
udp_vu_prepare() and udp_vu_csum() take a const struct iov_tail *
instead of referencing iov_vu directly, making data flow explicit.
udp_vu_csum() uses iov_drop_header() and IOV_REMOVE_HEADER() to locate
the UDP header and payload, replacing manual offset calculations via
vu_payloadv4()/vu_payloadv6().
Signed-off-by: Laurent Vivier
Add iov_put_header_() and its wrapper macro IOV_PUT_HEADER() as a
counterpart to IOV_PEEK_HEADER(). This writes header data back to an
iov_tail after modification. If the header pointer matches the
original iov buffer location, the data was already modified in place
and no copy is needed. Otherwise, it copies the data back using
iov_from_buf().
Add with_header(), a for-loop macro that combines IOV_PEEK_HEADER()
and IOV_PUT_HEADER() to allow modifying a header in place within a
block scope.
Signed-off-by: Laurent Vivier
udp_vu_sock_recv() currently mixes two concerns: receiving data from the
socket and managing virtqueue buffers (collecting, rewinding, releasing).
This makes the function harder to reason about and couples socket I/O
with virtqueue state.
Move all virtqueue operations, vu_collect(), vu_init_elem(),
vu_queue_rewind(), vu_set_vnethdr(), and the queue-readiness check, into
udp_vu_sock_to_tap(), which is the only caller. This turns
udp_vu_sock_recv() into a pure socket receive function that simply reads
into the provided iov array and adjusts its length.
Signed-off-by: Laurent Vivier
Change udp_update_hdr4() and udp_update_hdr6() to take an iov_tail
covering the full L3 frame (IP header + UDP header + data), instead of
separate IP header, udp_payload_t, and data-length parameters. The
functions now use with_header() and IOV_DROP_HEADER() to access the IP
and UDP headers directly from the iov_tail, and derive sizes via
iov_tail_size() rather than an explicit length argument.
This decouples the header update functions from the udp_payload_t memory
layout, which assumes all headers and data sit in a single contiguous
buffer. The vhost-user path uses virtqueue-provided scatter-gather
buffers where this assumption does not hold; passing an iov_tail lets
both the tap path and the vhost-user path share the same functions
without layout-specific helpers.
On the vhost-user side, udp_vu_prepare() likewise switches to
with_header() for the Ethernet header, and its caller now drops the
vnet header before calling udp_vu_prepare() instead of having the
function deal with it internally.
Signed-off-by: Laurent Vivier
participants (1)
-
Laurent Vivier