[RFC PATCH 0/3] Fill exceeding size of vhost-user buffers explicitly
While looking for a way to ensure inbound frames are padded to 60 bytes in vhost-user mode, I spotted the issue described in 1/3, which looks a bit too obvious to be true, so I'm sending this as RFC. Am I missing something there, or do we actually need to fix that? Stefano Brivio (3): vu_common: Stick to size of input buffer in vu_send_single() iov: Fix coding style of basic (non-IOV_TAIL) parts iov, vu_common: Make iov_from_buf() fill destination iov entirely iov.c | 116 +++++++++++++++++++++++++++------------------------- iov.h | 4 +- vu_common.c | 22 +++++----- 3 files changed, 74 insertions(+), 68 deletions(-) -- 2.43.0
...instead of copying, from the input buffer, the amount of available
bytes in the buffer provided via vhost-user.
The existing behaviour should be harmless due to the fact that we
don't request overly large buffers from vhost-user, but it doesn't
look correct to me.
Signed-off-by: Stefano Brivio
As I'm going to touch those in the next commit.
Signed-off-by: Stefano Brivio
...and, for consistency, rename 'bytes' to 'copy' in iov_to_buf().
Two commits ago, I changed vhost-user functions to use iov_from_buf()
to copy only up to the size of source buffers, instead of using the
size of the destination vhost-user buffers.
This change pads the rest with zeroes, which is not strictly needed,
but looks definitely cleaner.
Signed-off-by: Stefano Brivio
On 11/3/25 11:16, Stefano Brivio wrote:
...instead of copying, from the input buffer, the amount of available bytes in the buffer provided via vhost-user.
The existing behaviour should be harmless due to the fact that we don't request overly large buffers from vhost-user, but it doesn't look correct to me.
I don't understand where is the problem. I think the existing behaviour is correct because the iov array is padded to size in vu_collect() by: if (iov->iov_len > size - current_size) iov->iov_len = size - current_size; total can be lesser than size but in this case we exit. So total should be equal to input size + sizeof(struct virtio_net_hdr_mrg_rxbuf). And in iov_from_buf() we copy the content of the buffer according the available room in the iovec and the size of the input buffer. Thanks, Laurent>
Signed-off-by: Stefano Brivio
--- vu_common.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/vu_common.c b/vu_common.c index b13b7c3..21cfb2a 100644 --- a/vu_common.c +++ b/vu_common.c @@ -238,7 +238,7 @@ void vu_kick_cb(struct vu_dev *vdev, union epoll_ref ref, * vu_send_single() - Send a buffer to the front-end using the RX virtqueue * @c: execution context * @buf: address of the buffer - * @size: size of the buffer + * @size: size of the input buffer * * Return: number of bytes sent, -1 if there is an error */ @@ -248,7 +248,7 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size) struct vu_virtq *vq = &vdev->vq[VHOST_USER_RX_QUEUE]; struct vu_virtq_element elem[VIRTQUEUE_MAX_SIZE]; struct iovec in_sg[VIRTQUEUE_MAX_SIZE]; - size_t total; + size_t vu_buf_size; int elem_cnt; int i;
@@ -261,21 +261,20 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
vu_init_elem(elem, in_sg, VIRTQUEUE_MAX_SIZE);
- size += sizeof(struct virtio_net_hdr_mrg_rxbuf); - elem_cnt = vu_collect(vdev, vq, elem, VIRTQUEUE_MAX_SIZE, size, &total); - if (total < size) { + elem_cnt = vu_collect(vdev, vq, elem, VIRTQUEUE_MAX_SIZE, + size + sizeof(struct virtio_net_hdr_mrg_rxbuf), + &vu_buf_size); + if (vu_buf_size < size + sizeof(struct virtio_net_hdr_mrg_rxbuf)) { debug("vu_send_single: no space to send the data " - "elem_cnt %d size %zd", elem_cnt, total); + "elem_cnt %d size %zd", elem_cnt, size); goto err; }
vu_set_vnethdr(vdev, in_sg[0].iov_base, elem_cnt);
- total -= sizeof(struct virtio_net_hdr_mrg_rxbuf); - /* copy data from the buffer to the iovec */ iov_from_buf(in_sg, elem_cnt, sizeof(struct virtio_net_hdr_mrg_rxbuf), - buf, total); + buf, size);
if (*c->pcap) { pcap_iov(in_sg, elem_cnt, @@ -284,9 +283,10 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
vu_flush(vdev, vq, elem, elem_cnt);
- trace("vhost-user sent %zu", total); + trace("vhost-user sent %zu", + vu_buf_size - sizeof(struct virtio_net_hdr_mrg_rxbuf));
- return total; + return size; err: for (i = 0; i < elem_cnt; i++) vu_queue_detach_element(vq);
On 11/3/25 11:16, Stefano Brivio wrote:
...and, for consistency, rename 'bytes' to 'copy' in iov_to_buf().
Two commits ago, I changed vhost-user functions to use iov_from_buf() to copy only up to the size of source buffers, instead of using the size of the destination vhost-user buffers.
This change pads the rest with zeroes, which is not strictly needed, but looks definitely cleaner.
Signed-off-by: Stefano Brivio
--- iov.c | 54 ++++++++++++++++++++++++++++++++++------------------- iov.h | 4 ++-- vu_common.c | 2 +- 3 files changed, 38 insertions(+), 22 deletions(-)
Reviewed-by: Laurent Vivier
On 11/3/25 11:16, Stefano Brivio wrote:
As I'm going to touch those in the next commit.
Signed-off-by: Stefano Brivio
--- iov.c | 68 +++++++++++++++++++++++++---------------------------------- 1 file changed, 29 insertions(+), 39 deletions(-)
Reviewed-by: Laurent Vivier
On Tue, 4 Nov 2025 16:01:07 +0100
Laurent Vivier
On 11/3/25 11:16, Stefano Brivio wrote:
...instead of copying, from the input buffer, the amount of available bytes in the buffer provided via vhost-user.
The existing behaviour should be harmless due to the fact that we don't request overly large buffers from vhost-user, but it doesn't look correct to me.
I don't understand where is the problem.
I think the existing behaviour is correct because the iov array is padded to size in vu_collect() by:
if (iov->iov_len > size - current_size) iov->iov_len = size - current_size;
Ah, thanks, this is the part I missed. So the buffers could be bigger, but @frame_size set on return is <= @size. I was misled by: * @frame_size: The total size of the buffers (output) because that made me think that 'total' would be... well, the total size of the (output) buffers, not limited to what we requested. I wonder if we should fix: - this comment (I would call this "Available buffer length, up to @size, set on return") and perhaps the parameter name ("@collected"?) - the prototype. I see why UDP needs it this way, but wouldn't it be more natural to return error if the requested size is not available, and set the available buffer count as optional argument on return? The way it currently is, we're mostly ignoring the value of @frame_size (except for what we need for a comparison) in the only two cases where we actually pass a pointer. Well, we use that value in tcp_vu_sock_recv(), but just as a sum of values we already have in the caller. - or the interface altogether like I'm doing here, so that vu_collect() actually returns the size of the buffers. I think it's more natural like that but also looks a bit more complicated so I'm not necessarily fond of this What do you think? I can just fix the comment instead (feel free to do so, of course).
total can be lesser than size but in this case we exit.
So total should be equal to input size + sizeof(struct virtio_net_hdr_mrg_rxbuf).
And in iov_from_buf() we copy the content of the buffer according the available room in the iovec and the size of the input buffer.
Right, I see now. I wonder if 3/3 is even desired in this case, after all it's guest memory that's given to us but we don't touch in any way. If buffers are consistently much larger than what we request (for example, TCP flows with low MSS), we probably introduce substantial overhead by zeroing those bytes. I thought they were filled with random stuff because of the issue I though we had, but we don't have it, so no need to zero them either. -- Stefano
On 11/4/25 17:09, Stefano Brivio wrote:
On Tue, 4 Nov 2025 16:01:07 +0100 Laurent Vivier
wrote: On 11/3/25 11:16, Stefano Brivio wrote:
...instead of copying, from the input buffer, the amount of available bytes in the buffer provided via vhost-user.
The existing behaviour should be harmless due to the fact that we don't request overly large buffers from vhost-user, but it doesn't look correct to me.
I don't understand where is the problem.
I think the existing behaviour is correct because the iov array is padded to size in vu_collect() by:
if (iov->iov_len > size - current_size) iov->iov_len = size - current_size;
Ah, thanks, this is the part I missed. So the buffers could be bigger, but @frame_size set on return is <= @size. I was misled by:
* @frame_size: The total size of the buffers (output)
because that made me think that 'total' would be... well, the total size of the (output) buffers, not limited to what we requested.
I wonder if we should fix:
- this comment (I would call this "Available buffer length, up to @size, set on return") and perhaps the parameter name ("@collected"?)
- the prototype. I see why UDP needs it this way, but wouldn't it be more natural to return error if the requested size is not available, and set the available buffer count as optional argument on return?
The way it currently is, we're mostly ignoring the value of @frame_size (except for what we need for a comparison) in the only two cases where we actually pass a pointer.
Well, we use that value in tcp_vu_sock_recv(), but just as a sum of values we already have in the caller.
- or the interface altogether like I'm doing here, so that vu_collect() actually returns the size of the buffers. I think it's more natural like that but also looks a bit more complicated so I'm not necessarily fond of this
What do you think? I can just fix the comment instead (feel free to do so, of course). There is a problem in the code as it's not obvious how it works (some part are a little bit touchy...). So I think it needs to be changed, not especially in the logic but in the clarity.
I would say that it would be more natural to return an error if the requested size is not available. Choose the solution you prefer to be clear for you but avoid to change the behaviour as it can introduce unexpected results... Thanks, Laurent
On Mon, Nov 03, 2025 at 11:16:28AM +0100, Stefano Brivio wrote:
As I'm going to touch those in the next commit.
Signed-off-by: Stefano Brivio
Reviewed-by: David Gibson
/** - * iov_to_buf() - Copy data from a scatter/gather I/O vector (struct iovec) to - * a buffer efficiently. - * - * @iov: Pointer to the array of struct iovec describing the scatter/gather - * I/O vector. - * @iov_cnt: Number of elements in the iov array. - * @offset: Offset within the first element of iov from where copying should start.
The old description is, I think, wrong...
- * @buf: Pointer to the destination buffer where data will be copied. - * @bytes: Total number of bytes to copy from iov to buf. + * iov_to_buf() - Copy from iovec to flat buffer + * @iov: Source iovec array + * @iov_cnt: Number of elements in iovec array + * @offset: Source offset altogether, counted in flattened iovec
...but the new one I can't parse.
+ * @buf: Destination buffer + * @bytes: Bytes to copy
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Mon, Nov 03, 2025 at 11:16:29AM +0100, Stefano Brivio wrote:
...and, for consistency, rename 'bytes' to 'copy' in iov_to_buf().
Two commits ago, I changed vhost-user functions to use iov_from_buf() to copy only up to the size of source buffers, instead of using the size of the destination vhost-user buffers.
This change pads the rest with zeroes, which is not strictly needed, but looks definitely cleaner.
This seems like a useful thing, but maybe it would be clearer as a separate iov_memset() / iov_from_zero() rather than built into iov_from_buf().
Signed-off-by: Stefano Brivio
--- iov.c | 54 ++++++++++++++++++++++++++++++++++------------------- iov.h | 4 ++-- vu_common.c | 2 +- 3 files changed, 38 insertions(+), 22 deletions(-) diff --git a/iov.c b/iov.c index dc1b6b1..557be55 100644 --- a/iov.c +++ b/iov.c @@ -59,35 +59,51 @@ size_t iov_skip_bytes(const struct iovec *iov, size_t n, * @iov_cnt: Number of elements in the iovec array * @offset: Destination offset in iovec array * @buf: Source buffer - * @bytes: Bytes to copy + * @copy: Bytes to copy + * @fill: Bytes to zero-fill after copied bytes * - * Return: number of bytes copied + * Return: number of bytes filled, with data or zeroes */ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, - size_t offset, const void *buf, size_t bytes) + size_t offset, const void *buf, size_t copy, size_t fill) { + size_t copied, filled; unsigned int i; - size_t copied;
- if (__builtin_constant_p(bytes) && iov_cnt && - offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { - memcpy((char *)iov[0].iov_base + offset, buf, bytes); + if (__builtin_constant_p(copy) && iov_cnt && + offset <= iov[0].iov_len && + (copy + fill) <= iov[0].iov_len - offset) { + memcpy((char *)iov[0].iov_base + offset, buf, copy); + memset((char *)iov[0].iov_base + offset + copy, 0, fill);
- return bytes; + return copy + fill; }
i = iov_skip_bytes(iov, iov_cnt, offset, &offset);
- for (copied = 0; copied < bytes && i < iov_cnt; i++) { - size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + for (copied = 0; copied < copy && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, copy - copied);
memcpy((char *)iov[i].iov_base + offset, (char *)buf + copied, len); copied += len; + + if (copied < copy) + offset = 0; /* More to copy in the next iteration */ + else + offset = len; /* Start of zero-filling, see below */ + } + + for (filled = 0; filled < fill && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, fill - filled); + + memcpy((char *)iov[i].iov_base + offset, + (char *)buf + copied + filled, len); + filled += len; offset = 0; }
- return copied; + return copied + filled; }
/** @@ -96,27 +112,27 @@ size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, * @iov_cnt: Number of elements in iovec array * @offset: Source offset altogether, counted in flattened iovec * @buf: Destination buffer - * @bytes: Bytes to copy + * @copy: Bytes to copy * * Return: number of bytes copied */ size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, - size_t offset, void *buf, size_t bytes) + size_t offset, void *buf, size_t copy) { unsigned int i; size_t copied;
- if (__builtin_constant_p(bytes) && iov_cnt && - offset <= iov[0].iov_len && bytes <= iov[0].iov_len - offset) { - memcpy(buf, (char *)iov[0].iov_base + offset, bytes); + if (__builtin_constant_p(copy) && iov_cnt && + offset <= iov[0].iov_len && copy <= iov[0].iov_len - offset) { + memcpy(buf, (char *)iov[0].iov_base + offset, copy);
- return bytes; + return copy; }
i = iov_skip_bytes(iov, iov_cnt, offset, &offset);
- for (copied = 0; copied < bytes && i < iov_cnt; i++) { - size_t len = MIN(iov[i].iov_len - offset, bytes - copied); + for (copied = 0; copied < copy && i < iov_cnt; i++) { + size_t len = MIN(iov[i].iov_len - offset, copy - copied);
ASSERT(iov[i].iov_base);
diff --git a/iov.h b/iov.h index ba1fda5..9b5d910 100644 --- a/iov.h +++ b/iov.h @@ -24,9 +24,9 @@ size_t iov_skip_bytes(const struct iovec *iov, size_t n, size_t skip, size_t *offset); size_t iov_from_buf(const struct iovec *iov, size_t iov_cnt, - size_t offset, const void *buf, size_t bytes); + size_t offset, const void *buf, size_t copy, size_t fill); size_t iov_to_buf(const struct iovec *iov, size_t iov_cnt, - size_t offset, void *buf, size_t bytes); + size_t offset, void *buf, size_t copy); size_t iov_size(const struct iovec *iov, size_t iov_cnt);
/* diff --git a/vu_common.c b/vu_common.c index 21cfb2a..d941b72 100644 --- a/vu_common.c +++ b/vu_common.c @@ -274,7 +274,7 @@ int vu_send_single(const struct ctx *c, const void *buf, size_t size)
/* copy data from the buffer to the iovec */ iov_from_buf(in_sg, elem_cnt, sizeof(struct virtio_net_hdr_mrg_rxbuf), - buf, size); + buf, size, vu_buf_size - size);
if (*c->pcap) { pcap_iov(in_sg, elem_cnt, -- 2.43.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (3)
-
David Gibson
-
Laurent Vivier
-
Stefano Brivio