[PATCH 0/4] udp: Fix some confusion of IPv4 and IPv6 control structures
It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances. Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future. This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially. David Gibson (4): udp: Fix inorrect use of IPv6 mh buffers in IPv4 path udp: Better factor IPv4 and IPv6 paths in udp_sock_handler() udp: Preadjust udp[46]_l2_iov_tap[].iov_base for pasta mode udp: Factor out control structure management from udp_sock_fill_data_v[46] udp.c | 184 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 81 insertions(+), 103 deletions(-) -- 2.38.1
udp_sock_handler() incorrectly uses udp6_l2_mh_tap[] on the IPv4 path. In
fact this is harmless because this assignment is redundant (the 0th entry
msg_hdr will always point to the 0th iov entry for both IPv4 and IPv6 and
won't change).
There is also an incorrect usage of udp6_l2_mh_tap[] in
udp_sock_fill_data_v4. This one can cause real problems, because we'll
use stale iov_len values if we send multiple messages to the qemu socket.
Most of the time that will be relatively harmless - we're likely to either
drop UDP packets, or send duplicates. However, if the stale iov_len we
use ends up referencing an uninitialized buffer we could desynchronize the
qemu stream socket.
Correct both these bugs. The UDP6 path appears to be correct, but it does
have some comments that incorrectly reference the IPv4 versions, so fix
those as well.
Signed-off-by: David Gibson
Apart from which mh array they're operating on the recvmmsg() calls in
udp_sock_handler() are identical between the IPv4 and IPv6 paths, as are
some of the control structure updates.
By using some local variables to refer to the IP version specific control
arrays, make some more logic common between the IPv4 and IPv6 paths. As
well as slightly reducing the code size, this makes it less likely that
we'll accidentally use the IPv4 arrays in the IPv6 path or vice versa as we
did in a recently fixed bug.
Signed-off-by: David Gibson
Currently, we always populate udp[46]_l2_iov_tap[].iov_base with the
very start of the header buffers, including space for the qemu vnet_len
tag suitable for passt mode. That's ok because we don't actually use these
iovecs for pasta mode.
However, we do know the mode in udp_sock[46]_iov_init() so adjust these
to the beginning of the headers we'll actually need for the mode: including
the vnet_len tag for passt, but excluding it for pasta.
This allows a slightly nicer way to locate the right buffer to send in the
pasta case, and will allow some additional cleanups later.
Signed-off-by: David Gibson
The main purpose of udp_sock_fill_data_v[46]() is to construct the IP, UDP and other headers we'll need to forward data onto the tap interface. In addition they update the control structures (iovec and mmsghdr) we'll need to send the messages, and in the case of pasta actually sends it. This leads the control structure management and the send itself awkwardly split between udp_sock_fill_data_v[46]() and their caller udp_sock_handler(). In addition, this tail part of udp_sock_fill_datav[46] is essentially common between the IPv4 and IPv6 versions, apart from which control array we're working on. Clean this up by reducing these functions to just construct the headers and renaming them to udp_update_hdr[46]() accordingly. The control structure updates are now all in the caller, and common for IPv4 and IPv6. --- udp.c | 118 +++++++++++++++++++++++++--------------------------------- 1 file changed, 50 insertions(+), 68 deletions(-) diff --git a/udp.c b/udp.c index e2eb504..431d268 100644 --- a/udp.c +++ b/udp.c @@ -640,20 +640,17 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref, } /** - * udp_sock_fill_data_v4() - Fill and queue one buffer. In pasta mode, write it + * udp_update_hdr4() - Update headers for one IPv4 datagram * @c: Execution context * @n: Index of buffer in udp4_l2_buf pool - * @ref: epoll reference from socket - * @msg_idx: Index within message being prepared (spans multiple buffers) - * @msg_len: Length of current message being prepared for sending + * @dstport: Destination port number * @now: Current timestamp + * + * Return: size of tap frame with headers */ -static void udp_sock_fill_data_v4(const struct ctx *c, int n, - union epoll_ref ref, - int *msg_idx, int *msg_bufs, ssize_t *msg_len, - const struct timespec *now) +static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, + const struct timespec *now) { - struct msghdr *mh = &udp4_l2_mh_tap[*msg_idx].msg_hdr; struct udp4_l2_buf_t *b = &udp4_l2_buf[n]; size_t ip_len, buf_len; in_port_t src_port; @@ -687,51 +684,31 @@ static void udp_sock_fill_data_v4(const struct ctx *c, int n, udp_update_check4(b); b->uh.source = b->s_in.sin_port; - b->uh.dest = htons(ref.r.p.udp.udp.port); + b->uh.dest = htons(dstport); b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh)); - if (c->mode == MODE_PASTA) { - void *frame = udp4_l2_iov_tap[n].iov_base; - - if (write(c->fd_tap, frame, sizeof(b->eh) + ip_len) < 0) - debug("tap write: %s", strerror(errno)); - pcap(frame, sizeof(b->eh) + ip_len); - - return; - } - - b->vnet_len = htonl(ip_len + sizeof(struct ethhdr)); - buf_len = sizeof(uint32_t) + sizeof(struct ethhdr) + ip_len; - udp4_l2_iov_tap[n].iov_len = buf_len; - - /* With bigger messages, qemu closes the connection. */ - if (*msg_bufs && *msg_len + buf_len > SHRT_MAX) { - mh->msg_iovlen = *msg_bufs; + buf_len = ip_len + sizeof(b->eh); - (*msg_idx)++; - udp4_l2_mh_tap[*msg_idx].msg_hdr.msg_iov = &udp4_l2_iov_tap[n]; - *msg_len = *msg_bufs = 0; + if (c->mode == MODE_PASST) { + b->vnet_len = htonl(buf_len); + buf_len += sizeof(b->vnet_len); } - *msg_len += buf_len; - (*msg_bufs)++; + return buf_len; } /** - * udp_sock_fill_data_v6() - Fill and queue one buffer. In pasta mode, write it + * udp_update_hdr6() - Update headers for one IPv6 datagram * @c: Execution context * @n: Index of buffer in udp6_l2_buf pool - * @ref: epoll reference from socket - * @msg_idx: Index within message being prepared (spans multiple buffers) - * @msg_len: Length of current message being prepared for sending + * @dstport: Destination port number * @now: Current timestamp + * + * Return: size of tap frame with headers */ -static void udp_sock_fill_data_v6(const struct ctx *c, int n, - union epoll_ref ref, - int *msg_idx, int *msg_bufs, ssize_t *msg_len, - const struct timespec *now) +static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, + const struct timespec *now) { - struct msghdr *mh = &udp6_l2_mh_tap[*msg_idx].msg_hdr; struct udp6_l2_buf_t *b = &udp6_l2_buf[n]; size_t ip_len, buf_len; struct in6_addr *src; @@ -782,7 +759,7 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, } b->uh.source = b->s_in6.sin6_port; - b->uh.dest = htons(ref.r.p.udp.udp.port); + b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; b->ip6h.hop_limit = IPPROTO_UDP; @@ -792,31 +769,14 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, b->ip6h.nexthdr = IPPROTO_UDP; b->ip6h.hop_limit = 255; - if (c->mode == MODE_PASTA) { - void *frame = udp6_l2_iov_tap[n].iov_base; - - if (write(c->fd_tap, frame, sizeof(b->eh) + ip_len) < 0) - debug("tap write: %s", strerror(errno)); - pcap(frame, sizeof(b->eh) + ip_len); + buf_len = ip_len + sizeof(b->eh); - return; + if (c->mode == MODE_PASST) { + b->vnet_len = htonl(buf_len); + buf_len += sizeof(b->vnet_len); } - b->vnet_len = htonl(ip_len + sizeof(struct ethhdr)); - buf_len = sizeof(uint32_t) + sizeof(struct ethhdr) + ip_len; - udp6_l2_iov_tap[n].iov_len = buf_len; - - /* With bigger messages, qemu closes the connection. */ - if (*msg_bufs && *msg_len + buf_len > SHRT_MAX) { - mh->msg_iovlen = *msg_bufs; - - (*msg_idx)++; - udp6_l2_mh_tap[*msg_idx].msg_hdr.msg_iov = &udp6_l2_iov_tap[n]; - *msg_len = *msg_bufs = 0; - } - - *msg_len += buf_len; - (*msg_bufs)++; + return buf_len; } /** @@ -832,6 +792,7 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now) { + in_port_t dstport = ref.r.p.udp.udp.port; ssize_t n, msg_len = 0, missing = 0; struct mmsghdr *tap_mmh, *sock_mmh; int msg_bufs = 0, msg_i = 0, ret; @@ -863,12 +824,33 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, tap_mmh[0].msg_hdr.msg_iov = &tap_iov[0]; for (i = 0; i < (unsigned)n; i++) { + size_t buf_len; + if (ref.r.p.udp.udp.v6) - udp_sock_fill_data_v6(c, i, ref, - &msg_i, &msg_bufs, &msg_len, now); + buf_len = udp_update_hdr6(c, i, dstport, now); else - udp_sock_fill_data_v4(c, i, ref, - &msg_i, &msg_bufs, &msg_len, now); + buf_len = udp_update_hdr4(c, i, dstport, now); + + if (c->mode == MODE_PASTA) { + void *frame = tap_iov[i].iov_base; + + if (write(c->fd_tap, frame, buf_len) < 0) + debug("tap write: %s", strerror(errno)); + pcap(frame, buf_len); + } else { + tap_iov[i].iov_len = buf_len; + + /* With bigger messages, qemu closes the connection. */ + if (msg_bufs && msg_len + buf_len > SHRT_MAX) { + tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs; + msg_i++; + tap_mmh[msg_i].msg_hdr.msg_iov = &tap_iov[i]; + msg_len = msg_bufs = 0; + } + + msg_len += buf_len; + msg_bufs++; + } } tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs; -- 2.38.1
On Thu, 24 Nov 2022 19:54:17 +1100
David Gibson
It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances.
Whoops, this is embarrassing.
Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future.
The whole series looks good to me.
This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially.
I tried, it does, but I wouldn't needlessly rebase that one on top of this, I'd rather wait a bit and apply them in order. -- Stefano
On Fri, Nov 25, 2022 at 03:01:21AM +0100, Stefano Brivio wrote:
On Thu, 24 Nov 2022 19:54:17 +1100 David Gibson
wrote: It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances.
Whoops, this is embarrassing.
Heh, been there.
Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future.
The whole series looks good to me.
This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially.
I tried, it does, but I wouldn't needlessly rebase that one on top of this, I'd rather wait a bit and apply them in order.
Ok. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Thu, 24 Nov 2022 19:54:17 +1100
David Gibson
It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances.
Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future.
This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially.
Applied and pushed. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio