[PATCH v5 0/8] vhost-user,udp: Handle multiple iovec entries per virtqueue element
Some virtio-net drivers (notably iPXE) provide descriptors where the vnet header and the frame payload are in separate buffers, resulting in two iovec entries per virtqueue element. Currently, the RX (host to guest) path assumes a single iovec per element, which triggers: ASSERTION FAILED in virtqueue_map_desc (virtio.c:403): num_sg < max_num_sg This series reworks the UDP vhost-user receive path to support multiple iovec entries per element, fixing the iPXE crash. This series only addresses the UDP path. TCP vhost-user will be updated to use multi-iov elements in a subsequent series. v5: - This version doesn't change the padding system regarding v4, it's a complex task that will be addressed in another version - reorder patches and add new patches - remove IOV_PUT_HEADER()/with_header() and introduce IOV_PUSH_HEADER() - don't use the iov_tail to provide the headers to the functions - move vu_set_vnethdr() to vu_flush(), extract vu_queue_notify() - move vu_flush() inside loop in tcp_vu_data_from_sock() to flush data by frame and not by full data length v4: - rebase - replace ASSERT() by assert() v3: - include the series "Decouple iovec management from virtqueues elements" - because of this series, drop: "vu_common: Accept explicit iovec counts in vu_set_element()" "vu_common: Accept explicit iovec count per element in vu_init_elem()" "vu_common: Prepare to use multibuffer with guest RX" "vhost-user,udp: Use 2 iovec entries per element" - drop "vu_common: Pass iov_tail to vu_set_vnethdr()" as the specs insures a buffer is big enough to contain vnet header - introduce "with_header()" and merge "udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6()" and "udp_vu: Use iov_tail in udp_vu_prepare()" to use it v2: - add iov_truncate(), iov_memset() - remove iov_tail_truncate() and iov_tail_zero_end() - manage 802.3 minimum frame size Laurent Vivier (8): iov: Introduce iov_memset() vu_common: Move vnethdr setup into vu_flush() vhost-user: Centralise Ethernet frame padding in vu_collect(), vu_pad() and vu_flush() udp_vu: Move virtqueue management from udp_vu_sock_recv() to its caller udp_vu: Pass iov explicitly to helpers instead of using file-scoped array udp_vu: Allow virtqueue elements with multiple iovec entries iov: Introduce IOV_PUSH_HEADER() macro udp: Pass iov_tail to udp_update_hdr4()/udp_update_hdr6() iov.c | 48 ++++++++++++ iov.h | 13 ++++ tcp_vu.c | 36 +++------ udp.c | 81 ++++++++++---------- udp_internal.h | 10 +-- udp_vu.c | 201 +++++++++++++++++++++++++------------------------ vu_common.c | 64 ++++++++++------ vu_common.h | 3 +- 8 files changed, 263 insertions(+), 193 deletions(-) -- 2.53.0
Add a helper to set a range of bytes across an IO vector to a given
value, similar to memset() but operating over scatter-gather buffers.
It skips to the given offset and fills across iovec entries up to the
requested length.
Signed-off-by: Laurent Vivier
Every caller of vu_flush() was calling vu_set_vnethdr() beforehand with
the same pattern. Move it into vu_flush().
Remove vu_queue_notify() from vu_flush() and let callers invoke it
explicitly. This allows paths that perform multiple flushes, such as
tcp_vu_send_flag() and tcp_vu_data_from_sock(), to issue a single guest
notification at the end.
Signed-off-by: Laurent Vivier
The per-protocol padding done by vu_pad() in tcp_vu.c and udp_vu.c was
only correct for single-buffer frames, and assumed the padding area always
fell within the first iov. It also relied on each caller computing the
right MAX(..., ETH_ZLEN + VNET_HLEN) size for vu_collect() and calling
vu_pad() at the right point.
Centralise padding logic into three shared vhost-user helpers instead:
- vu_collect() now ensures at least ETH_ZLEN + VNET_HLEN bytes of buffer
space are collected, so there is always room for a minimum-sized frame.
- vu_pad() replaces the old single-iov helper with a new implementation
that takes a full iovec array plus a 'skipped' byte count. It uses a
new iov_memset() helper in iov.c to zero-fill the padding area across
iovec boundaries, then calls iov_truncate() to set the logical frame
size.
- vu_flush() computes the actual frame length (accounting for
VIRTIO_NET_F_MRG_RXBUF multi-buffer frames) and passes the padded
length to vu_queue_fill().
Callers in tcp_vu.c, udp_vu.c and vu_send_single() now use the new
vu_pad() in place of the old pad-then-truncate sequences and the
MAX(..., ETH_ZLEN + VNET_HLEN) size calculations passed to vu_collect().
Centralising padding here will also ease the move to multi-iovec per
element support, since there will be a single place to update.
In vu_send_single(), fix padding, truncation and data copy to use the
requested frame size rather than the total available buffer space from
vu_collect(), which could be larger. Also add matching padding, truncation
and explicit size to vu_collect() for the DUP_ACK path in
tcp_vu_send_flag().
Signed-off-by: Laurent Vivier
udp_vu_sock_recv() currently mixes two concerns: receiving data from the
socket and managing virtqueue buffers (collecting, rewinding, releasing).
This makes the function harder to reason about and couples socket I/O
with virtqueue state.
Move all virtqueue operations, vu_collect(), vu_init_elem(),
vu_queue_rewind(), vu_set_vnethdr(), and the queue-readiness check, into
udp_vu_sock_to_tap(), which is the only caller. This turns
udp_vu_sock_recv() into a pure socket receive function that simply reads
into the provided iov array and adjusts its length.
Signed-off-by: Laurent Vivier
udp_vu_sock_recv(), udp_vu_prepare(), and udp_vu_csum() all operated on
the file-scoped iov_vu[] array directly. Pass iov and count as explicit
parameters instead, and move iov_vu[] and elem[] to function-local
statics in udp_vu_sock_to_tap(), the only function that needs them.
Signed-off-by: Laurent Vivier
The previous code assumed a 1:1 mapping between virtqueue elements and
iovec entries (enforced by an assert). Drop that assumption to allow
elements that span multiple iovecs: track elem_used separately by
walking the element list against the iov count returned after padding.
This also fixes vu_queue_rewind() and vu_flush() to use the element
count rather than the iov count.
Use iov_tail_clone() in udp_vu_sock_recv() to handle header offset,
replacing the manual base/len adjustment and restore pattern.
Signed-off-by: Laurent Vivier
Add iov_push_header_() and its typed wrapper IOV_PUSH_HEADER() to write
a header into an iov_tail at the current offset and advance past it.
This is the write counterpart to IOV_PEEK_HEADER() / IOV_REMOVE_HEADER(),
using iov_from_buf() to copy the header data across iovec boundaries.
Signed-off-by: Laurent Vivier
Change udp_update_hdr4() and udp_update_hdr6() to take an iov_tail
pointing at the UDP frame instead of a contiguous udp_payload_t buffer
and explicit data length. This lets vhost-user pass scatter-gather
virtqueue buffers directly without an intermediate copy.
The UDP header is built into a local struct udphdr and written back with
IOV_PUSH_HEADER(). On the tap side, udp_tap_prepare() wraps the
existing udp_payload_t in a two-element iov to match the new interface.
Signed-off-by: Laurent Vivier
participants (1)
-
Laurent Vivier