It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances. Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future. This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially. David Gibson (4): udp: Fix inorrect use of IPv6 mh buffers in IPv4 path udp: Better factor IPv4 and IPv6 paths in udp_sock_handler() udp: Preadjust udp[46]_l2_iov_tap[].iov_base for pasta mode udp: Factor out control structure management from udp_sock_fill_data_v[46] udp.c | 184 ++++++++++++++++++++++++++-------------------------------- 1 file changed, 81 insertions(+), 103 deletions(-) -- 2.38.1
udp_sock_handler() incorrectly uses udp6_l2_mh_tap[] on the IPv4 path. In fact this is harmless because this assignment is redundant (the 0th entry msg_hdr will always point to the 0th iov entry for both IPv4 and IPv6 and won't change). There is also an incorrect usage of udp6_l2_mh_tap[] in udp_sock_fill_data_v4. This one can cause real problems, because we'll use stale iov_len values if we send multiple messages to the qemu socket. Most of the time that will be relatively harmless - we're likely to either drop UDP packets, or send duplicates. However, if the stale iov_len we use ends up referencing an uninitialized buffer we could desynchronize the qemu stream socket. Correct both these bugs. The UDP6 path appears to be correct, but it does have some comments that incorrectly reference the IPv4 versions, so fix those as well. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/udp.c b/udp.c index ee5c2c5..99f9374 100644 --- a/udp.c +++ b/udp.c @@ -643,7 +643,7 @@ static void udp_sock_fill_data_v4(const struct ctx *c, int n, int *msg_idx, int *msg_bufs, ssize_t *msg_len, const struct timespec *now) { - struct msghdr *mh = &udp6_l2_mh_tap[*msg_idx].msg_hdr; + struct msghdr *mh = &udp4_l2_mh_tap[*msg_idx].msg_hdr; struct udp4_l2_buf_t *b = &udp4_l2_buf[n]; size_t ip_len, buf_len; in_port_t src_port; @@ -717,9 +717,9 @@ static void udp_sock_fill_data_v4(const struct ctx *c, int n, } /** - * udp_sock_fill_data_v4() - Fill and queue one buffer. In pasta mode, write it + * udp_sock_fill_data_v6() - Fill and queue one buffer. In pasta mode, write it * @c: Execution context - * @n: Index of buffer in udp4_l2_buf pool + * @n: Index of buffer in udp6_l2_buf pool * @ref: epoll reference from socket * @msg_idx: Index within message being prepared (spans multiple buffers) * @msg_len: Length of current message being prepared for sending @@ -865,7 +865,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, if (n <= 0) return; - udp6_l2_mh_tap[0].msg_hdr.msg_iov = &udp6_l2_iov_tap[0]; + udp4_l2_mh_tap[0].msg_hdr.msg_iov = &udp4_l2_iov_tap[0]; for (i = 0; i < (unsigned)n; i++) { udp_sock_fill_data_v4(c, i, ref, -- 2.38.1
Apart from which mh array they're operating on the recvmmsg() calls in udp_sock_handler() are identical between the IPv4 and IPv6 paths, as are some of the control structure updates. By using some local variables to refer to the IP version specific control arrays, make some more logic common between the IPv4 and IPv6 paths. As well as slightly reducing the code size, this makes it less likely that we'll accidentally use the IPv4 arrays in the IPv6 path or vice versa as we did in a recently fixed bug. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/udp.c b/udp.c index 99f9374..6a34e85 100644 --- a/udp.c +++ b/udp.c @@ -833,9 +833,10 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now) { ssize_t n, msg_len = 0, missing = 0; + struct mmsghdr *tap_mmh, *sock_mmh; int msg_bufs = 0, msg_i = 0, ret; - struct mmsghdr *tap_mmh; struct msghdr *last_mh; + struct iovec *tap_iov; unsigned int i; if (events == EPOLLERR) @@ -847,34 +848,29 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, } if (ref.r.p.udp.udp.v6) { - n = recvmmsg(ref.r.s, udp6_l2_mh_sock, UDP_TAP_FRAMES, 0, NULL); - if (n <= 0) - return; - - udp6_l2_mh_tap[0].msg_hdr.msg_iov = &udp6_l2_iov_tap[0]; - - for (i = 0; i < (unsigned)n; i++) { - udp_sock_fill_data_v6(c, i, ref, - &msg_i, &msg_bufs, &msg_len, now); - } - - udp6_l2_mh_tap[msg_i].msg_hdr.msg_iovlen = msg_bufs; tap_mmh = udp6_l2_mh_tap; + sock_mmh = udp6_l2_mh_sock; + tap_iov = udp6_l2_iov_tap; } else { - n = recvmmsg(ref.r.s, udp4_l2_mh_sock, UDP_TAP_FRAMES, 0, NULL); - if (n <= 0) - return; + tap_mmh = udp4_l2_mh_tap; + sock_mmh = udp4_l2_mh_sock; + tap_iov = udp4_l2_iov_tap; + } - udp4_l2_mh_tap[0].msg_hdr.msg_iov = &udp4_l2_iov_tap[0]; + n = recvmmsg(ref.r.s, sock_mmh, UDP_TAP_FRAMES, 0, NULL); + if (n <= 0) + return; - for (i = 0; i < (unsigned)n; i++) { + tap_mmh[0].msg_hdr.msg_iov = &tap_iov[0]; + for (i = 0; i < (unsigned)n; i++) { + if (ref.r.p.udp.udp.v6) + udp_sock_fill_data_v6(c, i, ref, + &msg_i, &msg_bufs, &msg_len, now); + else udp_sock_fill_data_v4(c, i, ref, &msg_i, &msg_bufs, &msg_len, now); - } - - udp4_l2_mh_tap[msg_i].msg_hdr.msg_iovlen = msg_bufs; - tap_mmh = udp4_l2_mh_tap; } + tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs; if (c->mode == MODE_PASTA) return; -- 2.38.1
Currently, we always populate udp[46]_l2_iov_tap[].iov_base with the very start of the header buffers, including space for the qemu vnet_len tag suitable for passt mode. That's ok because we don't actually use these iovecs for pasta mode. However, we do know the mode in udp_sock[46]_iov_init() so adjust these to the beginning of the headers we'll actually need for the mode: including the vnet_len tag for passt, but excluding it for pasta. This allows a slightly nicer way to locate the right buffer to send in the pasta case, and will allow some additional cleanups later. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 36 ++++++++++++++++++------------------ 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/udp.c b/udp.c index 6a34e85..e2eb504 100644 --- a/udp.c +++ b/udp.c @@ -314,8 +314,9 @@ void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s, /** * udp_sock4_iov_init() - Initialise scatter-gather L2 buffers for IPv4 sockets + * @c: Execution context */ -static void udp_sock4_iov_init(void) +static void udp_sock4_iov_init(const struct ctx *c) { struct mmsghdr *h; int i; @@ -343,7 +344,11 @@ static void udp_sock4_iov_init(void) for (i = 0, h = udp4_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) { struct msghdr *mh = &h->msg_hdr; - udp4_l2_iov_tap[i].iov_base = &udp4_l2_buf[i].vnet_len; + if (c->mode == MODE_PASTA) + udp4_l2_iov_tap[i].iov_base = &udp4_l2_buf[i].eh; + else + udp4_l2_iov_tap[i].iov_base = &udp4_l2_buf[i].vnet_len; + mh->msg_iov = &udp4_l2_iov_tap[i]; mh->msg_iovlen = 1; } @@ -351,8 +356,9 @@ static void udp_sock4_iov_init(void) /** * udp_sock6_iov_init() - Initialise scatter-gather L2 buffers for IPv6 sockets + * @c: Execution context */ -static void udp_sock6_iov_init(void) +static void udp_sock6_iov_init(const struct ctx *c) { struct mmsghdr *h; int i; @@ -383,7 +389,11 @@ static void udp_sock6_iov_init(void) for (i = 0, h = udp6_l2_mh_tap; i < UDP_MAX_FRAMES; i++, h++) { struct msghdr *mh = &h->msg_hdr; - udp6_l2_iov_tap[i].iov_base = &udp6_l2_buf[i].vnet_len; + if (c->mode == MODE_PASTA) + udp6_l2_iov_tap[i].iov_base = &udp6_l2_buf[i].eh; + else + udp6_l2_iov_tap[i].iov_base = &udp6_l2_buf[i].vnet_len; + mh->msg_iov = &udp6_l2_iov_tap[i]; mh->msg_iovlen = 1; } @@ -681,16 +691,7 @@ static void udp_sock_fill_data_v4(const struct ctx *c, int n, b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh)); if (c->mode == MODE_PASTA) { - /* If we pass &b->eh directly to write(), starting from - * gcc 12.1, at least on aarch64 and x86_64, we get a bogus - * stringop-overread warning, due to: - * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103483 - * - * but we can't disable it with a pragma, because it will be - * ignored if LTO is enabled: - * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80922 - */ - void *frame = (char *)b + offsetof(struct udp4_l2_buf_t, eh); + void *frame = udp4_l2_iov_tap[n].iov_base; if (write(c->fd_tap, frame, sizeof(b->eh) + ip_len) < 0) debug("tap write: %s", strerror(errno)); @@ -792,8 +793,7 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, b->ip6h.hop_limit = 255; if (c->mode == MODE_PASTA) { - /* See udp_sock_fill_data_v4() for the reason behind 'frame' */ - void *frame = (char *)b + offsetof(struct udp6_l2_buf_t, eh); + void *frame = udp6_l2_iov_tap[n].iov_base; if (write(c->fd_tap, frame, sizeof(b->eh) + ip_len) < 0) debug("tap write: %s", strerror(errno)); @@ -1227,10 +1227,10 @@ static void udp_splice_iov_init(void) int udp_init(struct ctx *c) { if (c->ifi4) - udp_sock4_iov_init(); + udp_sock4_iov_init(c); if (c->ifi6) - udp_sock6_iov_init(); + udp_sock6_iov_init(c); udp_invert_portmap(&c->udp.fwd_in); udp_invert_portmap(&c->udp.fwd_out); -- 2.38.1
The main purpose of udp_sock_fill_data_v[46]() is to construct the IP, UDP and other headers we'll need to forward data onto the tap interface. In addition they update the control structures (iovec and mmsghdr) we'll need to send the messages, and in the case of pasta actually sends it. This leads the control structure management and the send itself awkwardly split between udp_sock_fill_data_v[46]() and their caller udp_sock_handler(). In addition, this tail part of udp_sock_fill_datav[46] is essentially common between the IPv4 and IPv6 versions, apart from which control array we're working on. Clean this up by reducing these functions to just construct the headers and renaming them to udp_update_hdr[46]() accordingly. The control structure updates are now all in the caller, and common for IPv4 and IPv6. --- udp.c | 118 +++++++++++++++++++++++++--------------------------------- 1 file changed, 50 insertions(+), 68 deletions(-) diff --git a/udp.c b/udp.c index e2eb504..431d268 100644 --- a/udp.c +++ b/udp.c @@ -640,20 +640,17 @@ static void udp_sock_handler_splice(const struct ctx *c, union epoll_ref ref, } /** - * udp_sock_fill_data_v4() - Fill and queue one buffer. In pasta mode, write it + * udp_update_hdr4() - Update headers for one IPv4 datagram * @c: Execution context * @n: Index of buffer in udp4_l2_buf pool - * @ref: epoll reference from socket - * @msg_idx: Index within message being prepared (spans multiple buffers) - * @msg_len: Length of current message being prepared for sending + * @dstport: Destination port number * @now: Current timestamp + * + * Return: size of tap frame with headers */ -static void udp_sock_fill_data_v4(const struct ctx *c, int n, - union epoll_ref ref, - int *msg_idx, int *msg_bufs, ssize_t *msg_len, - const struct timespec *now) +static size_t udp_update_hdr4(const struct ctx *c, int n, in_port_t dstport, + const struct timespec *now) { - struct msghdr *mh = &udp4_l2_mh_tap[*msg_idx].msg_hdr; struct udp4_l2_buf_t *b = &udp4_l2_buf[n]; size_t ip_len, buf_len; in_port_t src_port; @@ -687,51 +684,31 @@ static void udp_sock_fill_data_v4(const struct ctx *c, int n, udp_update_check4(b); b->uh.source = b->s_in.sin_port; - b->uh.dest = htons(ref.r.p.udp.udp.port); + b->uh.dest = htons(dstport); b->uh.len = htons(udp4_l2_mh_sock[n].msg_len + sizeof(b->uh)); - if (c->mode == MODE_PASTA) { - void *frame = udp4_l2_iov_tap[n].iov_base; - - if (write(c->fd_tap, frame, sizeof(b->eh) + ip_len) < 0) - debug("tap write: %s", strerror(errno)); - pcap(frame, sizeof(b->eh) + ip_len); - - return; - } - - b->vnet_len = htonl(ip_len + sizeof(struct ethhdr)); - buf_len = sizeof(uint32_t) + sizeof(struct ethhdr) + ip_len; - udp4_l2_iov_tap[n].iov_len = buf_len; - - /* With bigger messages, qemu closes the connection. */ - if (*msg_bufs && *msg_len + buf_len > SHRT_MAX) { - mh->msg_iovlen = *msg_bufs; + buf_len = ip_len + sizeof(b->eh); - (*msg_idx)++; - udp4_l2_mh_tap[*msg_idx].msg_hdr.msg_iov = &udp4_l2_iov_tap[n]; - *msg_len = *msg_bufs = 0; + if (c->mode == MODE_PASST) { + b->vnet_len = htonl(buf_len); + buf_len += sizeof(b->vnet_len); } - *msg_len += buf_len; - (*msg_bufs)++; + return buf_len; } /** - * udp_sock_fill_data_v6() - Fill and queue one buffer. In pasta mode, write it + * udp_update_hdr6() - Update headers for one IPv6 datagram * @c: Execution context * @n: Index of buffer in udp6_l2_buf pool - * @ref: epoll reference from socket - * @msg_idx: Index within message being prepared (spans multiple buffers) - * @msg_len: Length of current message being prepared for sending + * @dstport: Destination port number * @now: Current timestamp + * + * Return: size of tap frame with headers */ -static void udp_sock_fill_data_v6(const struct ctx *c, int n, - union epoll_ref ref, - int *msg_idx, int *msg_bufs, ssize_t *msg_len, - const struct timespec *now) +static size_t udp_update_hdr6(const struct ctx *c, int n, in_port_t dstport, + const struct timespec *now) { - struct msghdr *mh = &udp6_l2_mh_tap[*msg_idx].msg_hdr; struct udp6_l2_buf_t *b = &udp6_l2_buf[n]; size_t ip_len, buf_len; struct in6_addr *src; @@ -782,7 +759,7 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, } b->uh.source = b->s_in6.sin6_port; - b->uh.dest = htons(ref.r.p.udp.udp.port); + b->uh.dest = htons(dstport); b->uh.len = b->ip6h.payload_len; b->ip6h.hop_limit = IPPROTO_UDP; @@ -792,31 +769,14 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, b->ip6h.nexthdr = IPPROTO_UDP; b->ip6h.hop_limit = 255; - if (c->mode == MODE_PASTA) { - void *frame = udp6_l2_iov_tap[n].iov_base; - - if (write(c->fd_tap, frame, sizeof(b->eh) + ip_len) < 0) - debug("tap write: %s", strerror(errno)); - pcap(frame, sizeof(b->eh) + ip_len); + buf_len = ip_len + sizeof(b->eh); - return; + if (c->mode == MODE_PASST) { + b->vnet_len = htonl(buf_len); + buf_len += sizeof(b->vnet_len); } - b->vnet_len = htonl(ip_len + sizeof(struct ethhdr)); - buf_len = sizeof(uint32_t) + sizeof(struct ethhdr) + ip_len; - udp6_l2_iov_tap[n].iov_len = buf_len; - - /* With bigger messages, qemu closes the connection. */ - if (*msg_bufs && *msg_len + buf_len > SHRT_MAX) { - mh->msg_iovlen = *msg_bufs; - - (*msg_idx)++; - udp6_l2_mh_tap[*msg_idx].msg_hdr.msg_iov = &udp6_l2_iov_tap[n]; - *msg_len = *msg_bufs = 0; - } - - *msg_len += buf_len; - (*msg_bufs)++; + return buf_len; } /** @@ -832,6 +792,7 @@ static void udp_sock_fill_data_v6(const struct ctx *c, int n, void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now) { + in_port_t dstport = ref.r.p.udp.udp.port; ssize_t n, msg_len = 0, missing = 0; struct mmsghdr *tap_mmh, *sock_mmh; int msg_bufs = 0, msg_i = 0, ret; @@ -863,12 +824,33 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, tap_mmh[0].msg_hdr.msg_iov = &tap_iov[0]; for (i = 0; i < (unsigned)n; i++) { + size_t buf_len; + if (ref.r.p.udp.udp.v6) - udp_sock_fill_data_v6(c, i, ref, - &msg_i, &msg_bufs, &msg_len, now); + buf_len = udp_update_hdr6(c, i, dstport, now); else - udp_sock_fill_data_v4(c, i, ref, - &msg_i, &msg_bufs, &msg_len, now); + buf_len = udp_update_hdr4(c, i, dstport, now); + + if (c->mode == MODE_PASTA) { + void *frame = tap_iov[i].iov_base; + + if (write(c->fd_tap, frame, buf_len) < 0) + debug("tap write: %s", strerror(errno)); + pcap(frame, buf_len); + } else { + tap_iov[i].iov_len = buf_len; + + /* With bigger messages, qemu closes the connection. */ + if (msg_bufs && msg_len + buf_len > SHRT_MAX) { + tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs; + msg_i++; + tap_mmh[msg_i].msg_hdr.msg_iov = &tap_iov[i]; + msg_len = msg_bufs = 0; + } + + msg_len += buf_len; + msg_bufs++; + } } tap_mmh[msg_i].msg_hdr.msg_iovlen = msg_bufs; -- 2.38.1
On Thu, 24 Nov 2022 19:54:17 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances.Whoops, this is embarrassing.Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future.The whole series looks good to me.This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially.I tried, it does, but I wouldn't needlessly rebase that one on top of this, I'd rather wait a bit and apply them in order. -- Stefano
On Fri, Nov 25, 2022 at 03:01:21AM +0100, Stefano Brivio wrote:On Thu, 24 Nov 2022 19:54:17 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Heh, been there.It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances.Whoops, this is embarrassing.Ok. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonPath 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future.The whole series looks good to me.This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially.I tried, it does, but I wouldn't needlessly rebase that one on top of this, I'd rather wait a bit and apply them in order.
On Thu, 24 Nov 2022 19:54:17 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:It turns out a couple of places on the IPv4 specific inbound path accidentally use control structures that are supposed to be for IPv6. That could lead to weird behaviour in a rather complex set of circumstances. Path 1/4 here is the actual fix, the rest makes some clean ups to the code that should make similar mistakes harder errors harder to commit in future. This is based on my earlier cleanup of the UDP splicing code, although I think it will rebase trivially.Applied and pushed. -- Stefano