My recent changes to UDP socket management caused some unfortunate side effects. In particular, it has some bad interactions with UDP error handling. Fix several bugs here, along with some reworks to allow that. David Gibson (7): udp: Fix breakage of UDP error handling by PKTINFO support udp: Be quieter about errors on UDP receive udp: Pass socket & flow information direction to error handling functions udp: Deal with errors as we go in udp_sock_fwd() udp: Add udp_pktinfo() helper udp: Minor re-organisation of udp_sock_recverr() udp: Propagate errors on listening and brand new sockets udp.c | 215 +++++++++++++++++++++++++++++++++++++--------------------- 1 file changed, 138 insertions(+), 77 deletions(-) -- 2.49.0
We recently enabled the IP_PKTINFO / IPV6_RECVPKTINFO socket options on our UDP sockets. This lets us obtain and properly handle the specific local address used when we're "listening" with a socket on 0.0.0.0 or ::. However, the PKTINFO cmsgs this option generates appear on error queue messages as well as regular datagrams. udp_sock_recverr() doesn't expect this and so flags an unrecoverable error when it can't parse the control message. Correct this by adding space in udp_sock_recverr()s control buffer for the additional PKTINFO data, and scan through all cmsgs for the RECVERR, rather than only looking at the first one. Link: https://bugs.passt.top/show_bug.cgi?id=99 Fixes: f4b0dd8b06 ("udp: Use PKTINFO cmsgs to get destination address..") Reported-by: Stefano Brivio <sbrivio(a)redhat.com> Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/udp.c b/udp.c index 40af7dfc..f5fb98c2 100644 --- a/udp.c +++ b/udp.c @@ -155,6 +155,10 @@ __attribute__ ((aligned(32))) #endif udp_meta[UDP_MAX_FRAMES]; +#define PKTINFO_SPACE \ + MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ + CMSG_SPACE(sizeof(struct in6_pktinfo))) + /** * enum udp_iov_idx - Indices for the buffers making up a single UDP frame * @UDP_IOV_TAP tap specific header @@ -476,10 +480,10 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) struct sock_extended_err ee; union sockaddr_inany saddr; }; - const struct errhdr *eh; - const struct cmsghdr *hdr; - char buf[CMSG_SPACE(sizeof(struct errhdr))]; + char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; char data[ICMP6_MAX_DLEN]; + const struct errhdr *eh; + struct cmsghdr *hdr; int s = ref.fd; struct iovec iov = { .iov_base = data, @@ -507,12 +511,16 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) return -1; } - hdr = CMSG_FIRSTHDR(&mh); - if (!((hdr->cmsg_level == IPPROTO_IP && - hdr->cmsg_type == IP_RECVERR) || - (hdr->cmsg_level == IPPROTO_IPV6 && - hdr->cmsg_type == IPV6_RECVERR))) { - err("Unexpected cmsg reading error queue"); + for (hdr = CMSG_FIRSTHDR(&mh); hdr; hdr = CMSG_NXTHDR(&mh, hdr)) { + if ((hdr->cmsg_level == IPPROTO_IP && + hdr->cmsg_type == IP_RECVERR) || + (hdr->cmsg_level == IPPROTO_IPV6 && + hdr->cmsg_type == IPV6_RECVERR)) + break; + } + + if (!hdr) { + err("Missing RECVERR cmsg in error queue"); return -1; } @@ -587,10 +595,6 @@ static int udp_sock_errs(const struct ctx *c, union epoll_ref ref) return n_err; } -#define PKTINFO_SPACE \ - MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ - CMSG_SPACE(sizeof(struct in6_pktinfo))) - /** * udp_peek_addr() - Get source address for next packet * @s: Socket to get information from -- 2.49.0
If we get an error on UDP receive, either in udp_peek_addr() or udp_sock_recv(), we'll print an error message. However, this could be a perfectly routine UDP error triggered by an ICMP, which need not go to the error log. This doesn't usually happen, because before receiving we typically clear the error queue from udp_sock_errs(). However, it's possible an error could be flagged after udp_sock_errs() but before we receive. So it's better to handle this error "silently" (trace level only). We'll bail out of the receive, return to the epoll loop, and get an EPOLLERR where we'll handle and report the error properly. In particular there's one situation that can trigger this case much more easily. If we start a new outbound UDP flow to a local destination with nothing listening, we'll get a more or less immediate connection refused error. So, we'll get that error on the very first receive after the connect(). That will occur in udp_flow_defer() -> udp_flush_flow() -> udp_sock_fwd() -> udp_peek_addr() -> recvmsg(). This path doesn't call udp_sock_errs() first, so isn't (imperfectly) protected the way we are most of the time. Fixes: 84ab1305faba ("udp: Polish udp_vu_sock_info() and remove from...") Fixes: 69e5393c3722 ("udp: Move some more of sock_handler tasks into...") Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/udp.c b/udp.c index f5fb98c2..154f99b5 100644 --- a/udp.c +++ b/udp.c @@ -619,8 +619,8 @@ static int udp_peek_addr(int s, union sockaddr_inany *src, rc = recvmsg(s, &msg, MSG_PEEK | MSG_DONTWAIT); if (rc < 0) { - if (errno != EAGAIN && errno != EWOULDBLOCK) - warn_perror("Error peeking at socket address"); + trace("Error peeking at socket address: %s", strerror_(errno)); + /* Bail out and let the EPOLLERR handler deal with it */ return rc; } @@ -664,7 +664,8 @@ static int udp_sock_recv(const struct ctx *c, int s, struct mmsghdr *mmh, int n) n = recvmmsg(s, mmh, n, 0, NULL); if (n < 0) { - err_perror("Error receiving datagrams"); + trace("Error receiving datagrams: %s", strerror_(errno)); + /* Bail out and let the EPOLLERR handler deal with it */ return 0; } -- 2.49.0
udp_sock_recverr() and udp_sock_errs() take an epoll reference from which they obtain both the socket fd to receive errors from, and - for flow specific sockets - the flow and side the socket is associated with. We have some upcoming cases where we want to clear errors when we're not directly associated with receiving an epoll event, so it's not natural to have an epoll reference. Therefore, make these functions take the socket and flow from explicit parameters. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/udp.c b/udp.c index 154f99b5..c51ac955 100644 --- a/udp.c +++ b/udp.c @@ -467,14 +467,15 @@ static void udp_send_tap_icmp6(const struct ctx *c, /** * udp_sock_recverr() - Receive and clear an error from a socket * @c: Execution context - * @ref: epoll reference + * @s: Socket to receive errors from + * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue * * #syscalls recvmsg */ -static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) { struct errhdr { struct sock_extended_err ee; @@ -484,7 +485,6 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) char data[ICMP6_MAX_DLEN]; const struct errhdr *eh; struct cmsghdr *hdr; - int s = ref.fd; struct iovec iov = { .iov_base = data, .iov_len = sizeof(data) @@ -525,12 +525,12 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) } eh = (const struct errhdr *)CMSG_DATA(hdr); - if (ref.type == EPOLL_TYPE_UDP) { - flow_sidx_t sidx = flow_sidx_opposite(ref.flowside); - const struct flowside *toside = flowside_at_sidx(sidx); + if (flow_sidx_valid(sidx)) { + flow_sidx_t tosidx = flow_sidx_opposite(sidx); + const struct flowside *toside = flowside_at_sidx(tosidx); size_t dlen = rc; - if (pif_is_socket(pif_at_sidx(sidx))) { + if (pif_is_socket(pif_at_sidx(tosidx))) { /* XXX Is there any way to propagate ICMPs from socket * to socket? */ } else if (hdr->cmsg_level == IPPROTO_IP) { @@ -554,21 +554,21 @@ static int udp_sock_recverr(const struct ctx *c, union epoll_ref ref) /** * udp_sock_errs() - Process errors on a socket * @c: Execution context - * @ref: epoll reference + * @s: Socket to receive errors from + * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown * * Return: Number of errors handled, or < 0 if we have an unrecoverable error */ -static int udp_sock_errs(const struct ctx *c, union epoll_ref ref) +static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx) { unsigned n_err = 0; socklen_t errlen; - int s = ref.fd; int rc, err; ASSERT(!c->no_udp); /* Empty the error queue */ - while ((rc = udp_sock_recverr(c, ref)) > 0) + while ((rc = udp_sock_recverr(c, s, sidx)) > 0) n_err += rc; if (rc < 0) @@ -777,7 +777,7 @@ void udp_listen_sock_handler(const struct ctx *c, const struct timespec *now) { if (events & EPOLLERR) { - if (udp_sock_errs(c, ref) < 0) { + if (udp_sock_errs(c, ref.fd, FLOW_SIDX_NONE) < 0) { err("UDP: Unrecoverable error on listening socket:" " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port); /* FIXME: what now? close/re-open socket? */ @@ -804,7 +804,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, ASSERT(!c->no_udp && uflow); if (events & EPOLLERR) { - if (udp_sock_errs(c, ref) < 0) { + if (udp_sock_errs(c, ref.fd, ref.flowside) < 0) { flow_err(uflow, "Unrecoverable error on flow socket"); goto fail; } -- 2.49.0
When we get an epoll event on a listening socket, we first deal with any errors (udp_sock_errs()), then with any received packets (udp_sock_fwd()). However, it's theoretically possible that new errors could get flagged on the socket after we call udp_sock_errs(), in which case we could get errors returned in in udp_sock_fwd() -> udp_peek_addr() -> recvmsg(). In fact, we do deal with this correctly, although the path is somewhat non-obvious. The recvmsg() error will cause us to bail out of udp_sock_fwd(), but the EPOLLERR event will now be flagged, so we'll come back here next epoll loop and call udp_sock_errs(). Except.. we call udp_sock_fwd() from udp_flush_flow() as well as from epoll events. This is to deal with any packets that arrived between bind() and connect(), and so might not be associated with the socket's intended flow. This expects udp_sock_fwd() to flush _all_ queued datagrams, so that anything received later must be for the correct flow. At the moment, udp_sock_errs() might fail to flush all datagrams if errors occur. In particular this can happen in practice for locally reported errors which occur immediately after connect() (e.g. connecting to a local port with nothing listening). We can deal with the problem case, and also make the flow a little more natural for the common case by having udp_sock_fwd() call udp_sock_errs() to handle errors as the occur, rather than trying to deal with all errors in advance. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 45 ++++++++++++++++++++++++++------------------- 1 file changed, 26 insertions(+), 19 deletions(-) diff --git a/udp.c b/udp.c index c51ac955..0bec499d 100644 --- a/udp.c +++ b/udp.c @@ -601,7 +601,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx) * @src: Socket address (output) * @dst: (Local) destination address (output) * - * Return: 0 on success, -1 otherwise + * Return: 0 if no more packets, 1 on success, -ve error code on error */ static int udp_peek_addr(int s, union sockaddr_inany *src, union inany_addr *dst) @@ -619,9 +619,9 @@ static int udp_peek_addr(int s, union sockaddr_inany *src, rc = recvmsg(s, &msg, MSG_PEEK | MSG_DONTWAIT); if (rc < 0) { - trace("Error peeking at socket address: %s", strerror_(errno)); - /* Bail out and let the EPOLLERR handler deal with it */ - return rc; + if (errno == EAGAIN || errno == EWOULDBLOCK) + return 0; + return -errno; } hdr = CMSG_FIRSTHDR(&msg); @@ -644,7 +644,7 @@ static int udp_peek_addr(int s, union sockaddr_inany *src, sockaddr_ntop(src, sastr, sizeof(sastr)), inany_ntop(dst, dstr, sizeof(dstr))); - return 0; + return 1; } /** @@ -740,11 +740,27 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, { union sockaddr_inany src; union inany_addr dst; + int rc; - while (udp_peek_addr(s, &src, &dst) == 0) { - flow_sidx_t tosidx = udp_flow_from_sock(c, frompif, - &dst, port, &src, now); - uint8_t topif = pif_at_sidx(tosidx); + while ((rc = udp_peek_addr(s, &src, &dst)) != 0) { + flow_sidx_t tosidx; + uint8_t topif; + + if (rc < 0) { + trace("Error peeking at socket address: %s", + strerror_(-rc)); + /* Clear errors & carry on */ + if (udp_sock_errs(c, s, FLOW_SIDX_NONE) < 0) { + err( +"UDP: Unrecoverable error on listening socket: (%s port %hu)", + pif_name(frompif), port); + /* FIXME: what now? close/re-open socket? */ + } + continue; + } + + tosidx = udp_flow_from_sock(c, frompif, &dst, port, &src, now); + topif = pif_at_sidx(tosidx); if (pif_is_socket(topif)) { udp_sock_to_sock(c, s, 1, tosidx); @@ -776,16 +792,7 @@ void udp_listen_sock_handler(const struct ctx *c, union epoll_ref ref, uint32_t events, const struct timespec *now) { - if (events & EPOLLERR) { - if (udp_sock_errs(c, ref.fd, FLOW_SIDX_NONE) < 0) { - err("UDP: Unrecoverable error on listening socket:" - " (%s port %hu)", pif_name(ref.udp.pif), ref.udp.port); - /* FIXME: what now? close/re-open socket? */ - return; - } - } - - if (events & EPOLLIN) + if (events & (EPOLLERR | EPOLLIN)) udp_sock_fwd(c, ref.fd, ref.udp.pif, ref.udp.port, now); } -- 2.49.0
Currently we open code parsing the control message for IP_PKTINFO in udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO in another place, so split this out into a helper function. While we're there, make the parsing a bit more robust: scan all cmsgs to look for the one we want, rather than assuming there's only one. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 52 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/udp.c b/udp.c index 0bec499d..352ab83b 100644 --- a/udp.c +++ b/udp.c @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c, tap_icmp6_send(c, saddr, eaddr, &msg, msglen); } +/** + * udp_pktinfo() - Retreive packet destination address from cmsg + * @msg: msghdr into which message has been received + * @dst: (Local) destination address of message in @mh (output) + * + * Return: 0 on success, -1 if the information was missing (@dst is set to + * inany_any6). + */ +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) +{ + struct cmsghdr *hdr; + + for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) { + if (hdr->cmsg_level == IPPROTO_IP && + hdr->cmsg_type == IP_PKTINFO) { + const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr); + + *dst = inany_from_v4(i4->ipi_addr); + return 0; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && + hdr->cmsg_type == IPV6_PKTINFO) { + const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr); + + dst->a6 = i6->ipi6_addr; + return 0; + } + } + + err("Missing PKTINFO cmsg on datagram"); + *dst = inany_any6; + return -1; +} + /** * udp_sock_recverr() - Receive and clear an error from a socket * @c: Execution context @@ -607,7 +642,6 @@ static int udp_peek_addr(int s, union sockaddr_inany *src, union inany_addr *dst) { char sastr[SOCKADDR_STRLEN], dstr[INANY_ADDRSTRLEN]; - const struct cmsghdr *hdr; char cmsg[PKTINFO_SPACE]; struct msghdr msg = { .msg_name = src, @@ -624,21 +658,7 @@ static int udp_peek_addr(int s, union sockaddr_inany *src, return -errno; } - hdr = CMSG_FIRSTHDR(&msg); - if (hdr && hdr->cmsg_level == IPPROTO_IP && - hdr->cmsg_type == IP_PKTINFO) { - const struct in_pktinfo *info4 = (void *)CMSG_DATA(hdr); - - *dst = inany_from_v4(info4->ipi_addr); - } else if (hdr && hdr->cmsg_level == IPPROTO_IPV6 && - hdr->cmsg_type == IPV6_PKTINFO) { - const struct in6_pktinfo *info6 = (void *)CMSG_DATA(hdr); - - dst->a6 = info6->ipi6_addr; - } else { - debug("Unexpected cmsg on UDP datagram"); - *dst = inany_any6; - } + udp_pktinfo(&msg, dst); trace("Peeked UDP datagram: %s -> %s", sockaddr_ntop(src, sastr, sizeof(sastr)), -- 2.49.0
I took the liberty of applying this with two minor changes: On Tue, 15 Apr 2025 17:16:22 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Currently we open code parsing the control message for IP_PKTINFO in udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO in another place, so split this out into a helper function. While we're there, make the parsing a bit more robust: scan all cmsgs to look for the one we want, rather than assuming there's only one. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 52 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/udp.c b/udp.c index 0bec499d..352ab83b 100644 --- a/udp.c +++ b/udp.c @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c, tap_icmp6_send(c, saddr, eaddr, &msg, msglen); } +/** + * udp_pktinfo() - Retreive packet destination address from cmsgNit: "Retrieve", and:+ * @msg: msghdr into which message has been received + * @dst: (Local) destination address of message in @mh (output) + * + * Return: 0 on success, -1 if the information was missing (@dst is set to + * inany_any6). + */ +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) +{ + struct cmsghdr *hdr; + + for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) { + if (hdr->cmsg_level == IPPROTO_IP && + hdr->cmsg_type == IP_PKTINFO) { + const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr); + + *dst = inany_from_v4(i4->ipi_addr); + return 0; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && + hdr->cmsg_type == IPV6_PKTINFO) { + const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr); + + dst->a6 = i6->ipi6_addr; + return 0; + } + } + + err("Missing PKTINFO cmsg on datagram");...from a quick glance at the matching kernel path, this looks a bit too likely for err(), so I got a bit scared, and turned it to debug(). I think warn() is actually more correct than debug() (I think it shouldn't really be err() though because we can keep using this flow), but I would feel a lot more confident about it once we have rate-limited versions of those. Of course, I'll change this back to err() or warn() if you have a compelling reason. Similar reasoning in 7/7 where this is called. By the way, two messages (here and in caller) look redundant to me, regardless of their severity, but I'm not sure if you have further changes in mind where these separate prints would make sense. -- Stefano
On Tue, Apr 15, 2025 at 08:54:00PM +0200, Stefano Brivio wrote:I took the liberty of applying this with two minor changes: On Tue, 15 Apr 2025 17:16:22 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Oops, thanks.Currently we open code parsing the control message for IP_PKTINFO in udp_peek_addr(). We have an upcoming case where we want to parse PKTINFO in another place, so split this out into a helper function. While we're there, make the parsing a bit more robust: scan all cmsgs to look for the one we want, rather than assuming there's only one. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 52 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 36 insertions(+), 16 deletions(-) diff --git a/udp.c b/udp.c index 0bec499d..352ab83b 100644 --- a/udp.c +++ b/udp.c @@ -464,6 +464,41 @@ static void udp_send_tap_icmp6(const struct ctx *c, tap_icmp6_send(c, saddr, eaddr, &msg, msglen); } +/** + * udp_pktinfo() - Retreive packet destination address from cmsgNit: "Retrieve", and:Ehh.. fair enough.+ * @msg: msghdr into which message has been received + * @dst: (Local) destination address of message in @mh (output) + * + * Return: 0 on success, -1 if the information was missing (@dst is set to + * inany_any6). + */ +static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) +{ + struct cmsghdr *hdr; + + for (hdr = CMSG_FIRSTHDR(msg); hdr; hdr = CMSG_NXTHDR(msg, hdr)) { + if (hdr->cmsg_level == IPPROTO_IP && + hdr->cmsg_type == IP_PKTINFO) { + const struct in_pktinfo *i4 = (void *)CMSG_DATA(hdr); + + *dst = inany_from_v4(i4->ipi_addr); + return 0; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && + hdr->cmsg_type == IPV6_PKTINFO) { + const struct in6_pktinfo *i6 = (void *)CMSG_DATA(hdr); + + dst->a6 = i6->ipi6_addr; + return 0; + } + } + + err("Missing PKTINFO cmsg on datagram");...from a quick glance at the matching kernel path, this looks a bit too likely for err(), so I got a bit scared, and turned it to debug().I think warn() is actually more correct than debug() (I think it shouldn't really be err() though because we can keep using this flow),So, technically we don't have a flow at this point: the point here is that if we don't get the pktinfo we don't have the information we need to determine the right flow. But, regardless, you're right, warn() would be the right level in theory.but I would feel a lot more confident about it once we have rate-limited versions of those.Ok.Of course, I'll change this back to err() or warn() if you have a compelling reason. Similar reasoning in 7/7 where this is called. By the way, two messages (here and in caller) look redundant to me, regardless of their severity, but I'm not sure if you have further changes in mind where these separate prints would make sense.In this patch there aren't two messages, it's only in this function. There are two when I add the other caller in 7/7 which was an oversight. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
Usually we work with the "exit early" flow style, where we return early on "error" conditions in functions. We don't currently do this in udp_sock_recverr() for the case where we don't have a flow to associate the error with. Reorganise to use the "exit early" style, which will make some subsequent changes less awkward. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 46 ++++++++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 20 deletions(-) diff --git a/udp.c b/udp.c index 352ab83b..6db3accc 100644 --- a/udp.c +++ b/udp.c @@ -530,6 +530,9 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) .msg_control = buf, .msg_controllen = sizeof(buf), }; + const struct flowside *toside; + flow_sidx_t tosidx; + size_t dlen; ssize_t rc; rc = recvmsg(s, &mh, MSG_ERRQUEUE); @@ -560,29 +563,32 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) } eh = (const struct errhdr *)CMSG_DATA(hdr); - if (flow_sidx_valid(sidx)) { - flow_sidx_t tosidx = flow_sidx_opposite(sidx); - const struct flowside *toside = flowside_at_sidx(tosidx); - size_t dlen = rc; - - if (pif_is_socket(pif_at_sidx(tosidx))) { - /* XXX Is there any way to propagate ICMPs from socket - * to socket? */ - } else if (hdr->cmsg_level == IPPROTO_IP) { - dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, &eh->ee, toside, - eh->saddr.sa4.sin_addr, data, dlen); - } else if (hdr->cmsg_level == IPPROTO_IPV6) { - udp_send_tap_icmp6(c, &eh->ee, toside, - &eh->saddr.sa6.sin6_addr, data, - dlen, sidx.flowi); - } - } else { - trace("Ignoring received IP_RECVERR cmsg on listener socket"); - } + debug("%s error on UDP socket %i: %s", str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); + if (!flow_sidx_valid(sidx)) { + trace("Ignoring received IP_RECVERR cmsg on listener socket"); + return 1; + } + + tosidx = flow_sidx_opposite(sidx); + toside = flowside_at_sidx(tosidx); + dlen = rc; + + if (pif_is_socket(pif_at_sidx(tosidx))) { + /* XXX Is there any way to propagate ICMPs from socket to + * socket? */ + } else if (hdr->cmsg_level == IPPROTO_IP) { + dlen = MIN(dlen, ICMP4_MAX_DLEN); + udp_send_tap_icmp4(c, &eh->ee, toside, + eh->saddr.sa4.sin_addr, data, dlen); + } else if (hdr->cmsg_level == IPPROTO_IPV6) { + udp_send_tap_icmp6(c, &eh->ee, toside, + &eh->saddr.sa6.sin6_addr, data, + dlen, sidx.flowi); + } + return 1; } -- 2.49.0
udp_sock_recverr() processes errors on UDP sockets and attempts to propagate them as ICMP packets on the tap interface. To do this it currently requires the flow with which the error is associated as a parameter. If that's missing it will clear the error condition, but not propagate it. That means that we largely ignore errors on "listening" sockets. It also means we may discard some errors on flow specific sockets if they occur very shortly after the socket is created. In udp_flush_flow() we need to clear any datagrams received between bind() and connect() which might not be associated with the "final" flow for the socket. If we get errors before that point we'll ignore them in the same way because we don't know the flow they're associated with in advance. This can happen in practice if we have errors which occur almost immediately after connect(), such as ECONNREFUSED when we connect() to a local address where nothing is listening. Between the extended error message itself and the PKTINFO information we do actually have enough information to find the correct flow. So, rather than ignoring errors where we don't have a flow "hint", determine the flow the hard way in udp_sock_recverr(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/udp.c b/udp.c index 6db3accc..c04a091b 100644 --- a/udp.c +++ b/udp.c @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * @c: Execution context * @s: Socket to receive errors from * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown + * @pif: Interface on which the error occurred + * (only used if @sidx == FLOW_SIDX_NONE) + * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue * * #syscalls recvmsg */ -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, + uint8_t pif, in_port_t port) { struct errhdr { struct sock_extended_err ee; union sockaddr_inany saddr; }; char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; + const struct errhdr *eh = NULL; char data[ICMP6_MAX_DLEN]; - const struct errhdr *eh; struct cmsghdr *hdr; struct iovec iov = { .iov_base = data, .iov_len = sizeof(data) }; + union sockaddr_inany src; struct msghdr mh = { + .msg_name = &src, + .msg_namelen = sizeof(src), .msg_iov = &iov, .msg_iovlen = 1, .msg_control = buf, @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) hdr->cmsg_type == IP_RECVERR) || (hdr->cmsg_level == IPPROTO_IPV6 && hdr->cmsg_type == IPV6_RECVERR)) - break; + break; } if (!hdr) { @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); if (!flow_sidx_valid(sidx)) { - trace("Ignoring received IP_RECVERR cmsg on listener socket"); - return 1; + /* No hint from the socket, determine flow from addresses */ + union inany_addr dst; + + if (udp_pktinfo(&mh, &dst) < 0) { + warn("Missing PKTINFO on UDP error"); + return 1; + } + + sidx = flow_lookup_sa(c, IPPROTO_UDP, pif, &src, &dst, port); + if (!flow_sidx_valid(sidx)) { + debug("Ignoring UDP error without flow"); + return 1; + } } tosidx = flow_sidx_opposite(sidx); @@ -597,10 +615,14 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) * @c: Execution context * @s: Socket to receive errors from * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown + * @pif: Interface on which the error occurred + * (only used if @sidx == FLOW_SIDX_NONE) + * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) * * Return: Number of errors handled, or < 0 if we have an unrecoverable error */ -static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx) +static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx, + uint8_t pif, in_port_t port) { unsigned n_err = 0; socklen_t errlen; @@ -609,7 +631,7 @@ static int udp_sock_errs(const struct ctx *c, int s, flow_sidx_t sidx) ASSERT(!c->no_udp); /* Empty the error queue */ - while ((rc = udp_sock_recverr(c, s, sidx)) > 0) + while ((rc = udp_sock_recverr(c, s, sidx, pif, port)) > 0) n_err += rc; if (rc < 0) @@ -776,7 +798,8 @@ void udp_sock_fwd(const struct ctx *c, int s, uint8_t frompif, trace("Error peeking at socket address: %s", strerror_(-rc)); /* Clear errors & carry on */ - if (udp_sock_errs(c, s, FLOW_SIDX_NONE) < 0) { + if (udp_sock_errs(c, s, FLOW_SIDX_NONE, + frompif, port) < 0) { err( "UDP: Unrecoverable error on listening socket: (%s port %hu)", pif_name(frompif), port); @@ -837,7 +860,7 @@ void udp_sock_handler(const struct ctx *c, union epoll_ref ref, ASSERT(!c->no_udp && uflow); if (events & EPOLLERR) { - if (udp_sock_errs(c, ref.fd, ref.flowside) < 0) { + if (udp_sock_errs(c, ref.fd, ref.flowside, PIF_NONE, 0) < 0) { flow_err(uflow, "Unrecoverable error on flow socket"); goto fail; } -- 2.49.0
On Tue, 15 Apr 2025 17:16:24 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:udp_sock_recverr() processes errors on UDP sockets and attempts to propagate them as ICMP packets on the tap interface. To do this it currently requires the flow with which the error is associated as a parameter. If that's missing it will clear the error condition, but not propagate it. That means that we largely ignore errors on "listening" sockets. It also means we may discard some errors on flow specific sockets if they occur very shortly after the socket is created. In udp_flush_flow() we need to clear any datagrams received between bind() and connect() which might not be associated with the "final" flow for the socket. If we get errors before that point we'll ignore them in the same way because we don't know the flow they're associated with in advance. This can happen in practice if we have errors which occur almost immediately after connect(), such as ECONNREFUSED when we connect() to a local address where nothing is listening. Between the extended error message itself and the PKTINFO information we do actually have enough information to find the correct flow. So, rather than ignoring errors where we don't have a flow "hint", determine the flow the hard way in udp_sock_recverr(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/udp.c b/udp.c index 6db3accc..c04a091b 100644 --- a/udp.c +++ b/udp.c @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * @c: Execution context * @s: Socket to receive errors from * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown + * @pif: Interface on which the error occurred + * (only used if @sidx == FLOW_SIDX_NONE) + * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue * * #syscalls recvmsg */ -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, + uint8_t pif, in_port_t port) { struct errhdr { struct sock_extended_err ee; union sockaddr_inany saddr; }; char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; + const struct errhdr *eh = NULL; char data[ICMP6_MAX_DLEN]; - const struct errhdr *eh; struct cmsghdr *hdr; struct iovec iov = { .iov_base = data, .iov_len = sizeof(data) }; + union sockaddr_inany src; struct msghdr mh = { + .msg_name = &src, + .msg_namelen = sizeof(src), .msg_iov = &iov, .msg_iovlen = 1, .msg_control = buf, @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) hdr->cmsg_type == IP_RECVERR) || (hdr->cmsg_level == IPPROTO_IPV6 && hdr->cmsg_type == IPV6_RECVERR)) - break; + break; } if (!hdr) { @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); if (!flow_sidx_valid(sidx)) { - trace("Ignoring received IP_RECVERR cmsg on listener socket"); - return 1; + /* No hint from the socket, determine flow from addresses */ + union inany_addr dst; + + if (udp_pktinfo(&mh, &dst) < 0) { + warn("Missing PKTINFO on UDP error");...changed this to debug(), see my comments to 5/7. -- Stefano
On Tue, Apr 15, 2025 at 08:54:17PM +0200, Stefano Brivio wrote:On Tue, 15 Apr 2025 17:16:24 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Actually this one can go away entirely. As you pointed out it's redundant with the one in udp_pktinfo() itself. Oh well, the extra debug() should be harmless enough. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibsonudp_sock_recverr() processes errors on UDP sockets and attempts to propagate them as ICMP packets on the tap interface. To do this it currently requires the flow with which the error is associated as a parameter. If that's missing it will clear the error condition, but not propagate it. That means that we largely ignore errors on "listening" sockets. It also means we may discard some errors on flow specific sockets if they occur very shortly after the socket is created. In udp_flush_flow() we need to clear any datagrams received between bind() and connect() which might not be associated with the "final" flow for the socket. If we get errors before that point we'll ignore them in the same way because we don't know the flow they're associated with in advance. This can happen in practice if we have errors which occur almost immediately after connect(), such as ECONNREFUSED when we connect() to a local address where nothing is listening. Between the extended error message itself and the PKTINFO information we do actually have enough information to find the correct flow. So, rather than ignoring errors where we don't have a flow "hint", determine the flow the hard way in udp_sock_recverr(). Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) diff --git a/udp.c b/udp.c index 6db3accc..c04a091b 100644 --- a/udp.c +++ b/udp.c @@ -504,27 +504,34 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) * @c: Execution context * @s: Socket to receive errors from * @sidx: Flow and side of @s, or FLOW_SIDX_NONE if unknown + * @pif: Interface on which the error occurred + * (only used if @sidx == FLOW_SIDX_NONE) + * @port: Local port number of @s (only used if @sidx == FLOW_SIDX_NONE) * * Return: 1 if error received and processed, 0 if no more errors in queue, < 0 * if there was an error reading the queue * * #syscalls recvmsg */ -static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) +static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, + uint8_t pif, in_port_t port) { struct errhdr { struct sock_extended_err ee; union sockaddr_inany saddr; }; char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; + const struct errhdr *eh = NULL; char data[ICMP6_MAX_DLEN]; - const struct errhdr *eh; struct cmsghdr *hdr; struct iovec iov = { .iov_base = data, .iov_len = sizeof(data) }; + union sockaddr_inany src; struct msghdr mh = { + .msg_name = &src, + .msg_namelen = sizeof(src), .msg_iov = &iov, .msg_iovlen = 1, .msg_control = buf, @@ -554,7 +561,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) hdr->cmsg_type == IP_RECVERR) || (hdr->cmsg_level == IPPROTO_IPV6 && hdr->cmsg_type == IPV6_RECVERR)) - break; + break; } if (!hdr) { @@ -568,8 +575,19 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx) str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); if (!flow_sidx_valid(sidx)) { - trace("Ignoring received IP_RECVERR cmsg on listener socket"); - return 1; + /* No hint from the socket, determine flow from addresses */ + union inany_addr dst; + + if (udp_pktinfo(&mh, &dst) < 0) { + warn("Missing PKTINFO on UDP error");...changed this to debug(), see my comments to 5/7.
On Tue, 15 Apr 2025 17:16:17 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:My recent changes to UDP socket management caused some unfortunate side effects. In particular, it has some bad interactions with UDP error handling. Fix several bugs here, along with some reworks to allow that. David Gibson (7): udp: Fix breakage of UDP error handling by PKTINFO support udp: Be quieter about errors on UDP receive udp: Pass socket & flow information direction to error handling functions udp: Deal with errors as we go in udp_sock_fwd() udp: Add udp_pktinfo() helper udp: Minor re-organisation of udp_sock_recverr() udp: Propagate errors on listening and brand new socketsApplied with minor changes as described for 5/7 and 7/7. -- Stefano