We now propagate ICMP errors on UDP flows back into ICMP packets on the tap interface. However, we don't always get the source address right for the synthesized message. Because ICMPs can be generated by intermediate routers, that source address might not be one of the endpoints, so the address translation we already have isn't sufficient. Implement properly translating ICMP addresses when we need to. This ended up a bit messier than I hoped, but it seems to work. A simple case to test this is: pasta --config-net --map-host-loopback=172.16.1.1 -- \ sh -c "echo hello | socat STDIO UDP4:172.16.1.1:10001" where 10001 is a port where nothing is listening on the host. Without this series, this will just time out. pasta sends an ICMP Port Unreachable message, but it's sent with source address 127.0.0.1 and so discarded by the guest. With this series, the address is properly translated and we correctly get the error from socat: 2025/04/16 19:02:37 socat[3] E read(5, 0x555c3dbf2000, 8192): Connection refused David Gibson (4): fwd: Split out helpers for port-independent NAT treewide: Improve robustness against sockaddrs of unexpected family udp: Rework offender address handling in udp_sock_recverr() udp: Translate offender addresses for ICMP messages flow.c | 16 ++++++++-- fwd.c | 87 ++++++++++++++++++++++++++++++++++++++---------------- fwd.h | 3 ++ inany.h | 22 +++++++++----- tcp.c | 10 +++---- udp.c | 79 +++++++++++++++++++++++++++++++++++-------------- udp_flow.c | 6 ++-- 7 files changed, 157 insertions(+), 66 deletions(-) -- 2.49.0
Currently the functions fwd_nat_from_*() make some address translations based on both the IP address and protocol port numbers, and others based only on the address. We have some upcoming cases where it's useful to use the IP-address-only translations separately, so split them out into helper functions. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- fwd.c | 87 ++++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 62 insertions(+), 25 deletions(-) diff --git a/fwd.c b/fwd.c index 2829cd24..5c70e834 100644 --- a/fwd.c +++ b/fwd.c @@ -323,6 +323,30 @@ static bool fwd_guest_accessible(const struct ctx *c, return fwd_guest_accessible6(c, &addr->a6); } +/** + * nat_outbound() - Apply address translation for outbound (TAP to HOST) + * @c: Execution context + * @addr: Input address (as seen on TAP interface) + * @translated: Output address (as seen on HOST interface) + * + * Only handles translations that depend *only* on the address. Anything + * related to specific ports or flows is handled elsewhere. + */ +static void nat_outbound(const struct ctx *c, const union inany_addr *addr, + union inany_addr *translated) +{ + if (inany_equals4(addr, &c->ip4.map_host_loopback)) + *translated = inany_loopback4; + else if (inany_equals6(addr, &c->ip6.map_host_loopback)) + *translated = inany_loopback6; + else if (inany_equals4(addr, &c->ip4.map_guest_addr)) + *translated = inany_from_v4(c->ip4.addr); + else if (inany_equals6(addr, &c->ip6.map_guest_addr)) + translated->a6 = c->ip6.addr; + else + *translated = *addr; +} + /** * fwd_nat_from_tap() - Determine to forward a flow from the tap interface * @c: Execution context @@ -342,16 +366,8 @@ uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto, else if (is_dns_flow(proto, ini) && inany_equals6(&ini->oaddr, &c->ip6.dns_match)) tgt->eaddr.a6 = c->ip6.dns_host; - else if (inany_equals4(&ini->oaddr, &c->ip4.map_host_loopback)) - tgt->eaddr = inany_loopback4; - else if (inany_equals6(&ini->oaddr, &c->ip6.map_host_loopback)) - tgt->eaddr = inany_loopback6; - else if (inany_equals4(&ini->oaddr, &c->ip4.map_guest_addr)) - tgt->eaddr = inany_from_v4(c->ip4.addr); - else if (inany_equals6(&ini->oaddr, &c->ip6.map_guest_addr)) - tgt->eaddr.a6 = c->ip6.addr; else - tgt->eaddr = ini->oaddr; + nat_outbound(c, &ini->oaddr, &tgt->eaddr); tgt->eport = ini->oport; @@ -423,6 +439,42 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, return PIF_HOST; } +/** + * nat_inbound() - Apply address translation for outbound (HOST to TAP) + * @c: Execution context + * @addr: Input address (as seen on HOST interface) + * @translated: Output address (as seen on TAP interface) + * + * Return: true on success, false if it couldn't translate the address + * + * Only handles translations that depend *only* on the address. Anything + * related to specific ports or flows is handled elsewhere. + */ +static bool nat_inbound(const struct ctx *c, const union inany_addr *addr, + union inany_addr *translated) +{ + if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback) && + inany_equals4(addr, &in4addr_loopback)) { + /* Specifically 127.0.0.1, not 127.0.0.0/8 */ + *translated = inany_from_v4(c->ip4.map_host_loopback); + } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback) && + inany_equals6(addr, &in6addr_loopback)) { + translated->a6 = c->ip6.map_host_loopback; + } else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_guest_addr) && + inany_equals4(addr, &c->ip4.addr)) { + *translated = inany_from_v4(c->ip4.map_guest_addr); + } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && + inany_equals6(addr, &c->ip6.addr)) { + translated->a6 = c->ip6.map_guest_addr; + } else if (fwd_guest_accessible(c, addr)) { + *translated = *addr; + } else { + return false; + } + + return true; +} + /** * fwd_nat_from_host() - Determine to forward a flow from the host interface * @c: Execution context @@ -479,20 +531,7 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, return PIF_SPLICE; } - if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback) && - inany_equals4(&ini->eaddr, &in4addr_loopback)) { - /* Specifically 127.0.0.1, not 127.0.0.0/8 */ - tgt->oaddr = inany_from_v4(c->ip4.map_host_loopback); - } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback) && - inany_equals6(&ini->eaddr, &in6addr_loopback)) { - tgt->oaddr.a6 = c->ip6.map_host_loopback; - } else if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_guest_addr) && - inany_equals4(&ini->eaddr, &c->ip4.addr)) { - tgt->oaddr = inany_from_v4(c->ip4.map_guest_addr); - } else if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_guest_addr) && - inany_equals6(&ini->eaddr, &c->ip6.addr)) { - tgt->oaddr.a6 = c->ip6.map_guest_addr; - } else if (!fwd_guest_accessible(c, &ini->eaddr)) { + if (!nat_inbound(c, &ini->eaddr, &tgt->oaddr)) { if (inany_v4(&ini->eaddr)) { if (IN4_IS_ADDR_UNSPECIFIED(&c->ip4.our_tap_addr)) /* No source address we can use */ @@ -501,8 +540,6 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, } else { tgt->oaddr.a6 = c->ip6.our_tap_ll; } - } else { - tgt->oaddr = ini->eaddr; } tgt->oport = ini->eport; -- 2.49.0
inany_from_sockaddr() expects a socket address of family AF_INET or AF_INET6 and ASSERT()s if it gets anything else. In many of the callers we can handle an unexpected family more gracefully, though, e.g. by failing a single flow rather than killing passt. Change inany_from_sockaddr() to return an error instead of ASSERT()ing, and handle those errors in the callers. Improve the reporting of any such errors while we're at it. With this greater robustness, allow inany_from_sockaddr() to take a void * rather than specifically a union sockaddr_inany *. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- flow.c | 16 ++++++++++++++-- inany.h | 22 ++++++++++++++-------- tcp.c | 10 ++++------ udp_flow.c | 6 +++--- 4 files changed, 35 insertions(+), 19 deletions(-) diff --git a/flow.c b/flow.c index 3c81cb42..447c0211 100644 --- a/flow.c +++ b/flow.c @@ -408,7 +408,12 @@ struct flowside *flow_initiate_sa(union flow *flow, uint8_t pif, { struct flowside *ini = &flow->f.side[INISIDE]; - inany_from_sockaddr(&ini->eaddr, &ini->eport, ssa); + if (inany_from_sockaddr(&ini->eaddr, &ini->eport, ssa) < 0) { + char str[SOCKADDR_STRLEN]; + + ASSERT_WITH_MSG(0, "Bad socket address %s", + sockaddr_ntop(ssa, str, sizeof(str))); + } if (daddr) ini->oaddr = *daddr; else if (inany_v4(&ini->eaddr)) @@ -768,7 +773,14 @@ flow_sidx_t flow_lookup_sa(const struct ctx *c, uint8_t proto, uint8_t pif, .oport = oport, }; - inany_from_sockaddr(&side.eaddr, &side.eport, esa); + if (inany_from_sockaddr(&side.eaddr, &side.eport, esa) < 0) { + char str[SOCKADDR_STRLEN]; + + warn("Flow lookup on bad socket address %s", + sockaddr_ntop(esa, str, sizeof(str))); + return FLOW_SIDX_NONE; + } + if (oaddr) side.oaddr = *oaddr; else if (inany_v4(&side.eaddr)) diff --git a/inany.h b/inany.h index 1c247e1e..f5c618bf 100644 --- a/inany.h +++ b/inany.h @@ -239,22 +239,28 @@ static inline void inany_from_af(union inany_addr *aa, /** inany_from_sockaddr - Extract IPv[46] address and port number from sockaddr * @aa: Pointer to store IPv[46] address * @port: Pointer to store port number, host order - * @addr: AF_INET or AF_INET6 socket address + * @addr: Socket address + * + * Return: 0 on success, -1 on error (bad address family) */ -static inline void inany_from_sockaddr(union inany_addr *aa, in_port_t *port, - const union sockaddr_inany *sa) +static inline int inany_from_sockaddr(union inany_addr *aa, in_port_t *port, + const void *sa_) { + const union sockaddr_inany *sa = (const union sockaddr_inany *)sa_; + if (sa->sa_family == AF_INET6) { inany_from_af(aa, AF_INET6, &sa->sa6.sin6_addr); *port = ntohs(sa->sa6.sin6_port); - } else if (sa->sa_family == AF_INET) { + return 0; + } + + if (sa->sa_family == AF_INET) { inany_from_af(aa, AF_INET, &sa->sa4.sin_addr); *port = ntohs(sa->sa4.sin_port); - } else { - /* Not valid to call with other address families */ - ASSERT_WITH_MSG(0, "Unexpected sockaddr family: %u", - sa->sa_family); + return 0; } + + return -1; } /** inany_siphash_feed- Fold IPv[46] address into an in-progress siphash diff --git a/tcp.c b/tcp.c index 9c6bc529..0ac298a7 100644 --- a/tcp.c +++ b/tcp.c @@ -1546,9 +1546,8 @@ static void tcp_conn_from_tap(const struct ctx *c, sa_family_t af, if (c->mode == MODE_VU) { /* To rebind to same oport after migration */ sl = sizeof(sa); - if (!getsockname(s, &sa.sa, &sl)) - inany_from_sockaddr(&tgt->oaddr, &tgt->oport, &sa); - else + if (getsockname(s, &sa.sa, &sl) || + inany_from_sockaddr(&tgt->oaddr, &tgt->oport, &sa) < 0) err_perror("Can't get local address for socket %i", s); } @@ -2204,9 +2203,8 @@ void tcp_listen_handler(const struct ctx *c, union epoll_ref ref, NULL, ref.tcp_listen.port); if (c->mode == MODE_VU) { /* Rebind to same address after migration */ - if (!getsockname(s, &sa.sa, &sl)) - inany_from_sockaddr(&ini->oaddr, &ini->oport, &sa); - else + if (getsockname(s, &sa.sa, &sl) || + inany_from_sockaddr(&ini->oaddr, &ini->oport, &sa) < 0) err_perror("Can't get local address for socket %i", s); } diff --git a/udp_flow.c b/udp_flow.c index ef2cbb06..fea1cf3c 100644 --- a/udp_flow.c +++ b/udp_flow.c @@ -158,12 +158,12 @@ static flow_sidx_t udp_flow_new(const struct ctx *c, union flow *flow, socklen_t sl = sizeof(sa); in_port_t port; - if (getsockname(uflow->s[TGTSIDE], &sa.sa, &sl) < 0) { + if (getsockname(uflow->s[TGTSIDE], &sa.sa, &sl) < 0 || + inany_from_sockaddr(&uflow->f.side[TGTSIDE].oaddr, + &port, &sa) < 0) { flow_perror(uflow, "Unable to determine local address"); goto cancel; } - inany_from_sockaddr(&uflow->f.side[TGTSIDE].oaddr, - &port, &sa); if (port != tgt->oport) { flow_err(uflow, "Unexpected local port"); goto cancel; -- 2.49.0
On Wed, 16 Apr 2025 19:07:05 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:@@ -239,22 +239,28 @@ static inline void inany_from_af(union inany_addr *aa, /** inany_from_sockaddr - Extract IPv[46] address and port number from sockaddr * @aa: Pointer to store IPv[46] address * @port: Pointer to store port number, host order - * @addr: AF_INET or AF_INET6 socket address + * @addr: Socket addressThis is actually sa_ now but... can we do something for argument names in general, here? What about dst, port, sa, or dst, port, addr?+ * + * Return: 0 on success, -1 on error (bad address family) */ -static inline void inany_from_sockaddr(union inany_addr *aa, in_port_t *port, - const union sockaddr_inany *sa) +static inline int inany_from_sockaddr(union inany_addr *aa, in_port_t *port, + const void *sa_) { + const union sockaddr_inany *sa = (const union sockaddr_inany *)sa_;-- Stefano
On Wed, Apr 16, 2025 at 11:41:31AM +0200, Stefano Brivio wrote:On Wed, 16 Apr 2025 19:07:05 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Sure, done.@@ -239,22 +239,28 @@ static inline void inany_from_af(union inany_addr *aa, /** inany_from_sockaddr - Extract IPv[46] address and port number from sockaddr * @aa: Pointer to store IPv[46] address * @port: Pointer to store port number, host order - * @addr: AF_INET or AF_INET6 socket address + * @addr: Socket addressThis is actually sa_ now but... can we do something for argument names in general, here? What about dst, port, sa, or dst, port, addr?-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson+ * + * Return: 0 on success, -1 on error (bad address family) */ -static inline void inany_from_sockaddr(union inany_addr *aa, in_port_t *port, - const union sockaddr_inany *sa) +static inline int inany_from_sockaddr(union inany_addr *aa, in_port_t *port, + const void *sa_) { + const union sockaddr_inany *sa = (const union sockaddr_inany *)sa_;
Make a number of changes to udp_sock_recverr() to improve the robustness of how we handle addresses. * Get the "offender" address (source of the ICMP packet) using the SO_EE_OFFENDER() macro, reducing assumptions about structure layout. * Parse the offender sockaddr using inany_from_sockaddr() * Check explicitly that the source and destination pifs are what we expect. Previously we checked something that was probably equivalent in practice, but isn't strictly speaking what we require for the rest of the code. * Verify that for an ICMPv4 error we also have an IPv4 source/offender and destination/endpoint address * Verify that for an ICMPv6 error we have an IPv6 endpoint * Improve debug reporting of any failures Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 67 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/udp.c b/udp.c index 57769d06..4352520e 100644 --- a/udp.c +++ b/udp.c @@ -159,6 +159,12 @@ udp_meta[UDP_MAX_FRAMES]; MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ CMSG_SPACE(sizeof(struct in6_pktinfo))) +#define RECVERR_SPACE \ + MAX(CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in)), \ + CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in6))) + /** * enum udp_iov_idx - Indices for the buffers making up a single UDP frame * @UDP_IOV_TAP tap specific header @@ -516,12 +522,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, uint8_t pif, in_port_t port) { - struct errhdr { - struct sock_extended_err ee; - union sockaddr_inany saddr; - }; - char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; - const struct errhdr *eh = NULL; + char buf[PKTINFO_SPACE + RECVERR_SPACE]; + const struct sock_extended_err *ee; char data[ICMP6_MAX_DLEN]; struct cmsghdr *hdr; struct iovec iov = { @@ -538,7 +540,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, .msg_controllen = sizeof(buf), }; const struct flowside *toside; - flow_sidx_t tosidx; + char astr[INANY_ADDRSTRLEN]; + char sastr[SOCKADDR_STRLEN]; + union inany_addr offender; + const struct in_addr *o4; + in_port_t offender_port; + uint8_t topif; size_t dlen; ssize_t rc; @@ -569,10 +576,10 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, return -1; } - eh = (const struct errhdr *)CMSG_DATA(hdr); + ee = (const struct sock_extended_err *)CMSG_DATA(hdr); debug("%s error on UDP socket %i: %s", - str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); + str_ee_origin(ee), s, strerror_(ee->ee_errno)); if (!flow_sidx_valid(sidx)) { /* No hint from the socket, determine flow from addresses */ @@ -588,25 +595,43 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, debug("Ignoring UDP error without flow"); return 1; } + } else { + pif = pif_at_sidx(sidx); } - tosidx = flow_sidx_opposite(sidx); - toside = flowside_at_sidx(tosidx); + toside = flowside_at_sidx(flow_sidx_opposite(sidx)); + topif = pif_at_sidx(flow_sidx_opposite(sidx)); dlen = rc; - if (pif_is_socket(pif_at_sidx(tosidx))) { - /* XXX Is there any way to propagate ICMPs from socket to - * socket? */ - } else if (hdr->cmsg_level == IPPROTO_IP) { + if (inany_from_sockaddr(&offender, &offender_port, + SO_EE_OFFENDER(ee)) < 0) + goto fail; + + if (pif != PIF_HOST || topif != PIF_TAP) + /* XXX Can we support any other cases? */ + goto fail; + + if (hdr->cmsg_level == IPPROTO_IP && + (o4 = inany_v4(&offender)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, &eh->ee, toside, - eh->saddr.sa4.sin_addr, data, dlen); - } else if (hdr->cmsg_level == IPPROTO_IPV6) { - udp_send_tap_icmp6(c, &eh->ee, toside, - &eh->saddr.sa6.sin6_addr, data, - dlen, sidx.flowi); + udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); + return 1; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { + udp_send_tap_icmp6(c, ee, toside, &offender.a6, data, dlen, + sidx.flowi); + return 1; } +fail: + flow_dbg(flow_at_sidx(sidx), + "Can't propagate %s error from %s %s to %s %s", + str_ee_origin(ee), + pif_name(pif), + sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), + pif_name(topif), + inany_ntop(&toside->eaddr, astr, sizeof(astr))); return 1; } -- 2.49.0
On Wed, 16 Apr 2025 19:07:06 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Make a number of changes to udp_sock_recverr() to improve the robustness of how we handle addresses. * Get the "offender" address (source of the ICMP packet) using the SO_EE_OFFENDER() macro, reducing assumptions about structure layout. * Parse the offender sockaddr using inany_from_sockaddr() * Check explicitly that the source and destination pifs are what we expect. Previously we checked something that was probably equivalent in practice, but isn't strictly speaking what we require for the rest of the code. * Verify that for an ICMPv4 error we also have an IPv4 source/offender and destination/endpoint address * Verify that for an ICMPv6 error we have an IPv6 endpoint * Improve debug reporting of any failures Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 67 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/udp.c b/udp.c index 57769d06..4352520e 100644 --- a/udp.c +++ b/udp.c @@ -159,6 +159,12 @@ udp_meta[UDP_MAX_FRAMES]; MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ CMSG_SPACE(sizeof(struct in6_pktinfo))) +#define RECVERR_SPACE \ + MAX(CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in)), \ + CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in6))) + /** * enum udp_iov_idx - Indices for the buffers making up a single UDP frame * @UDP_IOV_TAP tap specific header @@ -516,12 +522,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, uint8_t pif, in_port_t port) { - struct errhdr { - struct sock_extended_err ee; - union sockaddr_inany saddr; - }; - char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; - const struct errhdr *eh = NULL; + char buf[PKTINFO_SPACE + RECVERR_SPACE]; + const struct sock_extended_err *ee; char data[ICMP6_MAX_DLEN]; struct cmsghdr *hdr; struct iovec iov = { @@ -538,7 +540,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, .msg_controllen = sizeof(buf), }; const struct flowside *toside; - flow_sidx_t tosidx; + char astr[INANY_ADDRSTRLEN]; + char sastr[SOCKADDR_STRLEN]; + union inany_addr offender; + const struct in_addr *o4; + in_port_t offender_port; + uint8_t topif; size_t dlen; ssize_t rc; @@ -569,10 +576,10 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, return -1; } - eh = (const struct errhdr *)CMSG_DATA(hdr); + ee = (const struct sock_extended_err *)CMSG_DATA(hdr); debug("%s error on UDP socket %i: %s", - str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); + str_ee_origin(ee), s, strerror_(ee->ee_errno)); if (!flow_sidx_valid(sidx)) { /* No hint from the socket, determine flow from addresses */ @@ -588,25 +595,43 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, debug("Ignoring UDP error without flow"); return 1; } + } else { + pif = pif_at_sidx(sidx);Two stray trailing tabs here.} - tosidx = flow_sidx_opposite(sidx); - toside = flowside_at_sidx(tosidx); + toside = flowside_at_sidx(flow_sidx_opposite(sidx)); + topif = pif_at_sidx(flow_sidx_opposite(sidx)); dlen = rc; - if (pif_is_socket(pif_at_sidx(tosidx))) { - /* XXX Is there any way to propagate ICMPs from socket to - * socket? */ - } else if (hdr->cmsg_level == IPPROTO_IP) { + if (inany_from_sockaddr(&offender, &offender_port, + SO_EE_OFFENDER(ee)) < 0) + goto fail; + + if (pif != PIF_HOST || topif != PIF_TAP) + /* XXX Can we support any other cases? */ + goto fail; + + if (hdr->cmsg_level == IPPROTO_IP && + (o4 = inany_v4(&offender)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, &eh->ee, toside, - eh->saddr.sa4.sin_addr, data, dlen); - } else if (hdr->cmsg_level == IPPROTO_IPV6) { - udp_send_tap_icmp6(c, &eh->ee, toside, - &eh->saddr.sa6.sin6_addr, data, - dlen, sidx.flowi); + udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); + return 1; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { + udp_send_tap_icmp6(c, ee, toside, &offender.a6, data, dlen, + sidx.flowi); + return 1; } +fail: + flow_dbg(flow_at_sidx(sidx),Coverity Scan seems to hallucinate here and says that flow_at_sidx() could return NULL, with its return value later dereferenced by flow_log(), even if you're explicitly checking flow_sidx_valid() in all the paths reaching to this point. Calling this conditionally only if flow_sidx_valid() doesn't mask the false positive either (I guess that's the part that goes wrong somehow), we really need to check if (flow_at_sidx(sidx)) flow_dbg(...). Would it be possible to add the useless check just for my own sanity?+ "Can't propagate %s error from %s %s to %s %s", + str_ee_origin(ee), + pif_name(pif), + sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), + pif_name(topif), + inany_ntop(&toside->eaddr, astr, sizeof(astr))); return 1; }-- Stefano
On Wed, Apr 16, 2025 at 04:27:36PM +0200, Stefano Brivio wrote:On Wed, 16 Apr 2025 19:07:06 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:Oops, fixed.Make a number of changes to udp_sock_recverr() to improve the robustness of how we handle addresses. * Get the "offender" address (source of the ICMP packet) using the SO_EE_OFFENDER() macro, reducing assumptions about structure layout. * Parse the offender sockaddr using inany_from_sockaddr() * Check explicitly that the source and destination pifs are what we expect. Previously we checked something that was probably equivalent in practice, but isn't strictly speaking what we require for the rest of the code. * Verify that for an ICMPv4 error we also have an IPv4 source/offender and destination/endpoint address * Verify that for an ICMPv6 error we have an IPv6 endpoint * Improve debug reporting of any failures Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- udp.c | 67 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/udp.c b/udp.c index 57769d06..4352520e 100644 --- a/udp.c +++ b/udp.c @@ -159,6 +159,12 @@ udp_meta[UDP_MAX_FRAMES]; MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ CMSG_SPACE(sizeof(struct in6_pktinfo))) +#define RECVERR_SPACE \ + MAX(CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in)), \ + CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in6))) + /** * enum udp_iov_idx - Indices for the buffers making up a single UDP frame * @UDP_IOV_TAP tap specific header @@ -516,12 +522,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, uint8_t pif, in_port_t port) { - struct errhdr { - struct sock_extended_err ee; - union sockaddr_inany saddr; - }; - char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; - const struct errhdr *eh = NULL; + char buf[PKTINFO_SPACE + RECVERR_SPACE]; + const struct sock_extended_err *ee; char data[ICMP6_MAX_DLEN]; struct cmsghdr *hdr; struct iovec iov = { @@ -538,7 +540,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, .msg_controllen = sizeof(buf), }; const struct flowside *toside; - flow_sidx_t tosidx; + char astr[INANY_ADDRSTRLEN]; + char sastr[SOCKADDR_STRLEN]; + union inany_addr offender; + const struct in_addr *o4; + in_port_t offender_port; + uint8_t topif; size_t dlen; ssize_t rc; @@ -569,10 +576,10 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, return -1; } - eh = (const struct errhdr *)CMSG_DATA(hdr); + ee = (const struct sock_extended_err *)CMSG_DATA(hdr); debug("%s error on UDP socket %i: %s", - str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); + str_ee_origin(ee), s, strerror_(ee->ee_errno)); if (!flow_sidx_valid(sidx)) { /* No hint from the socket, determine flow from addresses */ @@ -588,25 +595,43 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, debug("Ignoring UDP error without flow"); return 1; } + } else { + pif = pif_at_sidx(sidx);Two stray trailing tabs here.Sure. I was already borderline on whether it was clearer to introduce an explicit uflow variable, so I've done that now, and asserted it's non-NULL. I've checked that removes the coverity whinge, at least running locally.} - tosidx = flow_sidx_opposite(sidx); - toside = flowside_at_sidx(tosidx); + toside = flowside_at_sidx(flow_sidx_opposite(sidx)); + topif = pif_at_sidx(flow_sidx_opposite(sidx)); dlen = rc; - if (pif_is_socket(pif_at_sidx(tosidx))) { - /* XXX Is there any way to propagate ICMPs from socket to - * socket? */ - } else if (hdr->cmsg_level == IPPROTO_IP) { + if (inany_from_sockaddr(&offender, &offender_port, + SO_EE_OFFENDER(ee)) < 0) + goto fail; + + if (pif != PIF_HOST || topif != PIF_TAP) + /* XXX Can we support any other cases? */ + goto fail; + + if (hdr->cmsg_level == IPPROTO_IP && + (o4 = inany_v4(&offender)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, &eh->ee, toside, - eh->saddr.sa4.sin_addr, data, dlen); - } else if (hdr->cmsg_level == IPPROTO_IPV6) { - udp_send_tap_icmp6(c, &eh->ee, toside, - &eh->saddr.sa6.sin6_addr, data, - dlen, sidx.flowi); + udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); + return 1; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { + udp_send_tap_icmp6(c, ee, toside, &offender.a6, data, dlen, + sidx.flowi); + return 1; } +fail: + flow_dbg(flow_at_sidx(sidx),Coverity Scan seems to hallucinate here and says that flow_at_sidx() could return NULL, with its return value later dereferenced by flow_log(), even if you're explicitly checking flow_sidx_valid() in all the paths reaching to this point. Calling this conditionally only if flow_sidx_valid() doesn't mask the false positive either (I guess that's the part that goes wrong somehow), we really need to check if (flow_at_sidx(sidx)) flow_dbg(...). Would it be possible to add the useless check just for my own sanity?-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson+ "Can't propagate %s error from %s %s to %s %s", + str_ee_origin(ee), + pif_name(pif), + sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), + pif_name(topif), + inany_ntop(&toside->eaddr, astr, sizeof(astr))); return 1; }
We've recently added support for propagating ICMP errors related to a UDP flow from the host to the guest, by handling the extended UDP error on the socket and synthesizing a suitable ICMP on the tap interface. Currently we create that ICMP with a source address of the "offender" from the extended error information - the source of the ICMP error received on the host. However, we don't translate this address for cases where we NAT between host and guest. This means (amongst other things) that we won't get a "Connection refused" error as expected if send data from the guest to the --map-host-loopback address. The error comes from 127.0.0.1 on the host, which doesn't make sense on the tap interface and will be discarded by the guest. Because ICMP errors can be sent by an intermediate host, not just by the endpoints of the flow, we can't handle this translation purely with the information in the flow table entry. We need to explicitly translate this address by our NAT rules, which we can do with the nat_inbound() helper. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- fwd.c | 4 ++-- fwd.h | 3 +++ udp.c | 18 ++++++++++++++---- 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/fwd.c b/fwd.c index 5c70e834..b73c2c8d 100644 --- a/fwd.c +++ b/fwd.c @@ -450,8 +450,8 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, * Only handles translations that depend *only* on the address. Anything * related to specific ports or flows is handled elsewhere. */ -static bool nat_inbound(const struct ctx *c, const union inany_addr *addr, - union inany_addr *translated) +bool nat_inbound(const struct ctx *c, const union inany_addr *addr, + union inany_addr *translated) { if (!IN4_IS_ADDR_UNSPECIFIED(&c->ip4.map_host_loopback) && inany_equals4(addr, &in4addr_loopback)) { diff --git a/fwd.h b/fwd.h index 3562f3ca..0458a3c5 100644 --- a/fwd.h +++ b/fwd.h @@ -7,6 +7,7 @@ #ifndef FWD_H #define FWD_H +union inany_addr; struct flowside; /* Number of ports for both TCP and UDP */ @@ -47,6 +48,8 @@ void fwd_scan_ports_udp(struct fwd_ports *fwd, const struct fwd_ports *rev, const struct fwd_ports *tcp_rev); void fwd_scan_ports_init(struct ctx *c); +bool nat_inbound(const struct ctx *c, const union inany_addr *addr, + union inany_addr *translated); uint8_t fwd_nat_from_tap(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt); uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, diff --git a/udp.c b/udp.c index 4352520e..ebee1d14 100644 --- a/udp.c +++ b/udp.c @@ -539,10 +539,10 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, .msg_control = buf, .msg_controllen = sizeof(buf), }; - const struct flowside *toside; + const struct flowside *fromside, *toside; + union inany_addr offender, otap; char astr[INANY_ADDRSTRLEN]; char sastr[SOCKADDR_STRLEN]; - union inany_addr offender; const struct in_addr *o4; in_port_t offender_port; uint8_t topif; @@ -599,6 +599,7 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, pif = pif_at_sidx(sidx); } + fromside = flowside_at_sidx(sidx); toside = flowside_at_sidx(flow_sidx_opposite(sidx)); topif = pif_at_sidx(flow_sidx_opposite(sidx)); dlen = rc; @@ -611,15 +612,24 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, /* XXX Can we support any other cases? */ goto fail; + /* If the offender *is* the endpoint, make sure our translation is + * consistent with the flow's translation. This matters if the flow + * endpoint has a port specific translation (like --dns-match). + */ + if (inany_equals(&offender, &fromside->eaddr)) + otap = toside->oaddr; + else if (!nat_inbound(c, &offender, &otap)) + goto fail; + if (hdr->cmsg_level == IPPROTO_IP && - (o4 = inany_v4(&offender)) && inany_v4(&toside->eaddr)) { + (o4 = inany_v4(&otap)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); return 1; } if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { - udp_send_tap_icmp6(c, ee, toside, &offender.a6, data, dlen, + udp_send_tap_icmp6(c, ee, toside, &otap.a6, data, dlen, sidx.flowi); return 1; } -- 2.49.0
On Wed, 16 Apr 2025 19:07:03 +1000 David Gibson <david(a)gibson.dropbear.id.au> wrote:We now propagate ICMP errors on UDP flows back into ICMP packets on the tap interface. However, we don't always get the source address right for the synthesized message. Because ICMPs can be generated by intermediate routers, that source address might not be one of the endpoints, so the address translation we already have isn't sufficient. Implement properly translating ICMP addresses when we need to. This ended up a bit messier than I hoped, but it seems to work. A simple case to test this is: pasta --config-net --map-host-loopback=172.16.1.1 -- \ sh -c "echo hello | socat STDIO UDP4:172.16.1.1:10001" where 10001 is a port where nothing is listening on the host.Oh, that's convenient. I also checked this against the "bad resolver address" case I reported previously, everything "works": # nslookup passt.top 169.254.1.1 ;; communications error to 169.254.1.1#53: connection refused ;; communications error to 169.254.1.1#53: connection refused ;; communications error to 169.254.1.1#53: connection refused ;; no servers could be reached Except for those few comments to 2/4 and 3/4, everything else looks good to me. -- Stefano