[PATCH 0/4] Translate source addresses for ICMP errors
We now propagate ICMP errors on UDP flows back into ICMP packets on the tap interface. However, we don't always get the source address right for the synthesized message. Because ICMPs can be generated by intermediate routers, that source address might not be one of the endpoints, so the address translation we already have isn't sufficient. Implement properly translating ICMP addresses when we need to. This ended up a bit messier than I hoped, but it seems to work. A simple case to test this is: pasta --config-net --map-host-loopback=172.16.1.1 -- \ sh -c "echo hello | socat STDIO UDP4:172.16.1.1:10001" where 10001 is a port where nothing is listening on the host. Without this series, this will just time out. pasta sends an ICMP Port Unreachable message, but it's sent with source address 127.0.0.1 and so discarded by the guest. With this series, the address is properly translated and we correctly get the error from socat: 2025/04/16 19:02:37 socat[3] E read(5, 0x555c3dbf2000, 8192): Connection refused David Gibson (4): fwd: Split out helpers for port-independent NAT treewide: Improve robustness against sockaddrs of unexpected family udp: Rework offender address handling in udp_sock_recverr() udp: Translate offender addresses for ICMP messages flow.c | 16 ++++++++-- fwd.c | 87 ++++++++++++++++++++++++++++++++++++++---------------- fwd.h | 3 ++ inany.h | 22 +++++++++----- tcp.c | 10 +++---- udp.c | 79 +++++++++++++++++++++++++++++++++++-------------- udp_flow.c | 6 ++-- 7 files changed, 157 insertions(+), 66 deletions(-) -- 2.49.0
Currently the functions fwd_nat_from_*() make some address translations
based on both the IP address and protocol port numbers, and others based
only on the address. We have some upcoming cases where it's useful to use
the IP-address-only translations separately, so split them out into helper
functions.
Signed-off-by: David Gibson
inany_from_sockaddr() expects a socket address of family AF_INET or
AF_INET6 and ASSERT()s if it gets anything else. In many of the callers we
can handle an unexpected family more gracefully, though, e.g. by failing
a single flow rather than killing passt.
Change inany_from_sockaddr() to return an error instead of ASSERT()ing,
and handle those errors in the callers. Improve the reporting of any such
errors while we're at it.
With this greater robustness, allow inany_from_sockaddr() to take a void *
rather than specifically a union sockaddr_inany *.
Signed-off-by: David Gibson
On Wed, 16 Apr 2025 19:07:05 +1000
David Gibson
@@ -239,22 +239,28 @@ static inline void inany_from_af(union inany_addr *aa, /** inany_from_sockaddr - Extract IPv[46] address and port number from sockaddr * @aa: Pointer to store IPv[46] address * @port: Pointer to store port number, host order - * @addr: AF_INET or AF_INET6 socket address + * @addr: Socket address
This is actually sa_ now but... can we do something for argument names in general, here? What about dst, port, sa, or dst, port, addr?
+ * + * Return: 0 on success, -1 on error (bad address family) */ -static inline void inany_from_sockaddr(union inany_addr *aa, in_port_t *port, - const union sockaddr_inany *sa) +static inline int inany_from_sockaddr(union inany_addr *aa, in_port_t *port, + const void *sa_) { + const union sockaddr_inany *sa = (const union sockaddr_inany *)sa_;
-- Stefano
On Wed, Apr 16, 2025 at 11:41:31AM +0200, Stefano Brivio wrote:
On Wed, 16 Apr 2025 19:07:05 +1000 David Gibson
wrote: @@ -239,22 +239,28 @@ static inline void inany_from_af(union inany_addr *aa, /** inany_from_sockaddr - Extract IPv[46] address and port number from sockaddr * @aa: Pointer to store IPv[46] address * @port: Pointer to store port number, host order - * @addr: AF_INET or AF_INET6 socket address + * @addr: Socket address
This is actually sa_ now but... can we do something for argument names in general, here? What about dst, port, sa, or dst, port, addr?
Sure, done.
+ * + * Return: 0 on success, -1 on error (bad address family) */ -static inline void inany_from_sockaddr(union inany_addr *aa, in_port_t *port, - const union sockaddr_inany *sa) +static inline int inany_from_sockaddr(union inany_addr *aa, in_port_t *port, + const void *sa_) { + const union sockaddr_inany *sa = (const union sockaddr_inany *)sa_;
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
Make a number of changes to udp_sock_recverr() to improve the robustness
of how we handle addresses.
* Get the "offender" address (source of the ICMP packet) using the
SO_EE_OFFENDER() macro, reducing assumptions about structure layout.
* Parse the offender sockaddr using inany_from_sockaddr()
* Check explicitly that the source and destination pifs are what we
expect. Previously we checked something that was probably equivalent
in practice, but isn't strictly speaking what we require for the rest
of the code.
* Verify that for an ICMPv4 error we also have an IPv4 source/offender
and destination/endpoint address
* Verify that for an ICMPv6 error we have an IPv6 endpoint
* Improve debug reporting of any failures
Signed-off-by: David Gibson
On Wed, 16 Apr 2025 19:07:06 +1000
David Gibson
Make a number of changes to udp_sock_recverr() to improve the robustness of how we handle addresses.
* Get the "offender" address (source of the ICMP packet) using the SO_EE_OFFENDER() macro, reducing assumptions about structure layout. * Parse the offender sockaddr using inany_from_sockaddr() * Check explicitly that the source and destination pifs are what we expect. Previously we checked something that was probably equivalent in practice, but isn't strictly speaking what we require for the rest of the code. * Verify that for an ICMPv4 error we also have an IPv4 source/offender and destination/endpoint address * Verify that for an ICMPv6 error we have an IPv6 endpoint * Improve debug reporting of any failures
Signed-off-by: David Gibson
--- udp.c | 67 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/udp.c b/udp.c index 57769d06..4352520e 100644 --- a/udp.c +++ b/udp.c @@ -159,6 +159,12 @@ udp_meta[UDP_MAX_FRAMES]; MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ CMSG_SPACE(sizeof(struct in6_pktinfo)))
+#define RECVERR_SPACE \ + MAX(CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in)), \ + CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in6))) + /** * enum udp_iov_idx - Indices for the buffers making up a single UDP frame * @UDP_IOV_TAP tap specific header @@ -516,12 +522,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, uint8_t pif, in_port_t port) { - struct errhdr { - struct sock_extended_err ee; - union sockaddr_inany saddr; - }; - char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; - const struct errhdr *eh = NULL; + char buf[PKTINFO_SPACE + RECVERR_SPACE]; + const struct sock_extended_err *ee; char data[ICMP6_MAX_DLEN]; struct cmsghdr *hdr; struct iovec iov = { @@ -538,7 +540,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, .msg_controllen = sizeof(buf), }; const struct flowside *toside; - flow_sidx_t tosidx; + char astr[INANY_ADDRSTRLEN]; + char sastr[SOCKADDR_STRLEN]; + union inany_addr offender; + const struct in_addr *o4; + in_port_t offender_port; + uint8_t topif; size_t dlen; ssize_t rc;
@@ -569,10 +576,10 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, return -1; }
- eh = (const struct errhdr *)CMSG_DATA(hdr); + ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
debug("%s error on UDP socket %i: %s", - str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); + str_ee_origin(ee), s, strerror_(ee->ee_errno));
if (!flow_sidx_valid(sidx)) { /* No hint from the socket, determine flow from addresses */ @@ -588,25 +595,43 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, debug("Ignoring UDP error without flow"); return 1; } + } else { + pif = pif_at_sidx(sidx);
Two stray trailing tabs here.
}
- tosidx = flow_sidx_opposite(sidx); - toside = flowside_at_sidx(tosidx); + toside = flowside_at_sidx(flow_sidx_opposite(sidx)); + topif = pif_at_sidx(flow_sidx_opposite(sidx)); dlen = rc;
- if (pif_is_socket(pif_at_sidx(tosidx))) { - /* XXX Is there any way to propagate ICMPs from socket to - * socket? */ - } else if (hdr->cmsg_level == IPPROTO_IP) { + if (inany_from_sockaddr(&offender, &offender_port, + SO_EE_OFFENDER(ee)) < 0) + goto fail; + + if (pif != PIF_HOST || topif != PIF_TAP) + /* XXX Can we support any other cases? */ + goto fail; + + if (hdr->cmsg_level == IPPROTO_IP && + (o4 = inany_v4(&offender)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, &eh->ee, toside, - eh->saddr.sa4.sin_addr, data, dlen); - } else if (hdr->cmsg_level == IPPROTO_IPV6) { - udp_send_tap_icmp6(c, &eh->ee, toside, - &eh->saddr.sa6.sin6_addr, data, - dlen, sidx.flowi); + udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); + return 1; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { + udp_send_tap_icmp6(c, ee, toside, &offender.a6, data, dlen, + sidx.flowi); + return 1; }
+fail: + flow_dbg(flow_at_sidx(sidx),
Coverity Scan seems to hallucinate here and says that flow_at_sidx() could return NULL, with its return value later dereferenced by flow_log(), even if you're explicitly checking flow_sidx_valid() in all the paths reaching to this point. Calling this conditionally only if flow_sidx_valid() doesn't mask the false positive either (I guess that's the part that goes wrong somehow), we really need to check if (flow_at_sidx(sidx)) flow_dbg(...). Would it be possible to add the useless check just for my own sanity?
+ "Can't propagate %s error from %s %s to %s %s", + str_ee_origin(ee), + pif_name(pif), + sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), + pif_name(topif), + inany_ntop(&toside->eaddr, astr, sizeof(astr))); return 1; }
-- Stefano
On Wed, Apr 16, 2025 at 04:27:36PM +0200, Stefano Brivio wrote:
On Wed, 16 Apr 2025 19:07:06 +1000 David Gibson
wrote: Make a number of changes to udp_sock_recverr() to improve the robustness of how we handle addresses.
* Get the "offender" address (source of the ICMP packet) using the SO_EE_OFFENDER() macro, reducing assumptions about structure layout. * Parse the offender sockaddr using inany_from_sockaddr() * Check explicitly that the source and destination pifs are what we expect. Previously we checked something that was probably equivalent in practice, but isn't strictly speaking what we require for the rest of the code. * Verify that for an ICMPv4 error we also have an IPv4 source/offender and destination/endpoint address * Verify that for an ICMPv6 error we have an IPv6 endpoint * Improve debug reporting of any failures
Signed-off-by: David Gibson
--- udp.c | 67 ++++++++++++++++++++++++++++++++++++++++------------------- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/udp.c b/udp.c index 57769d06..4352520e 100644 --- a/udp.c +++ b/udp.c @@ -159,6 +159,12 @@ udp_meta[UDP_MAX_FRAMES]; MAX(CMSG_SPACE(sizeof(struct in_pktinfo)), \ CMSG_SPACE(sizeof(struct in6_pktinfo)))
+#define RECVERR_SPACE \ + MAX(CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in)), \ + CMSG_SPACE(sizeof(struct sock_extended_err) + \ + sizeof(struct sockaddr_in6))) + /** * enum udp_iov_idx - Indices for the buffers making up a single UDP frame * @UDP_IOV_TAP tap specific header @@ -516,12 +522,8 @@ static int udp_pktinfo(struct msghdr *msg, union inany_addr *dst) static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, uint8_t pif, in_port_t port) { - struct errhdr { - struct sock_extended_err ee; - union sockaddr_inany saddr; - }; - char buf[PKTINFO_SPACE + CMSG_SPACE(sizeof(struct errhdr))]; - const struct errhdr *eh = NULL; + char buf[PKTINFO_SPACE + RECVERR_SPACE]; + const struct sock_extended_err *ee; char data[ICMP6_MAX_DLEN]; struct cmsghdr *hdr; struct iovec iov = { @@ -538,7 +540,12 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, .msg_controllen = sizeof(buf), }; const struct flowside *toside; - flow_sidx_t tosidx; + char astr[INANY_ADDRSTRLEN]; + char sastr[SOCKADDR_STRLEN]; + union inany_addr offender; + const struct in_addr *o4; + in_port_t offender_port; + uint8_t topif; size_t dlen; ssize_t rc;
@@ -569,10 +576,10 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, return -1; }
- eh = (const struct errhdr *)CMSG_DATA(hdr); + ee = (const struct sock_extended_err *)CMSG_DATA(hdr);
debug("%s error on UDP socket %i: %s", - str_ee_origin(&eh->ee), s, strerror_(eh->ee.ee_errno)); + str_ee_origin(ee), s, strerror_(ee->ee_errno));
if (!flow_sidx_valid(sidx)) { /* No hint from the socket, determine flow from addresses */ @@ -588,25 +595,43 @@ static int udp_sock_recverr(const struct ctx *c, int s, flow_sidx_t sidx, debug("Ignoring UDP error without flow"); return 1; } + } else { + pif = pif_at_sidx(sidx);
Two stray trailing tabs here.
Oops, fixed.
}
- tosidx = flow_sidx_opposite(sidx); - toside = flowside_at_sidx(tosidx); + toside = flowside_at_sidx(flow_sidx_opposite(sidx)); + topif = pif_at_sidx(flow_sidx_opposite(sidx)); dlen = rc;
- if (pif_is_socket(pif_at_sidx(tosidx))) { - /* XXX Is there any way to propagate ICMPs from socket to - * socket? */ - } else if (hdr->cmsg_level == IPPROTO_IP) { + if (inany_from_sockaddr(&offender, &offender_port, + SO_EE_OFFENDER(ee)) < 0) + goto fail; + + if (pif != PIF_HOST || topif != PIF_TAP) + /* XXX Can we support any other cases? */ + goto fail; + + if (hdr->cmsg_level == IPPROTO_IP && + (o4 = inany_v4(&offender)) && inany_v4(&toside->eaddr)) { dlen = MIN(dlen, ICMP4_MAX_DLEN); - udp_send_tap_icmp4(c, &eh->ee, toside, - eh->saddr.sa4.sin_addr, data, dlen); - } else if (hdr->cmsg_level == IPPROTO_IPV6) { - udp_send_tap_icmp6(c, &eh->ee, toside, - &eh->saddr.sa6.sin6_addr, data, - dlen, sidx.flowi); + udp_send_tap_icmp4(c, ee, toside, *o4, data, dlen); + return 1; + } + + if (hdr->cmsg_level == IPPROTO_IPV6 && !inany_v4(&toside->eaddr)) { + udp_send_tap_icmp6(c, ee, toside, &offender.a6, data, dlen, + sidx.flowi); + return 1; }
+fail: + flow_dbg(flow_at_sidx(sidx),
Coverity Scan seems to hallucinate here and says that flow_at_sidx() could return NULL, with its return value later dereferenced by flow_log(), even if you're explicitly checking flow_sidx_valid() in all the paths reaching to this point.
Calling this conditionally only if flow_sidx_valid() doesn't mask the false positive either (I guess that's the part that goes wrong somehow), we really need to check if (flow_at_sidx(sidx)) flow_dbg(...).
Would it be possible to add the useless check just for my own sanity?
Sure. I was already borderline on whether it was clearer to introduce an explicit uflow variable, so I've done that now, and asserted it's non-NULL. I've checked that removes the coverity whinge, at least running locally.
+ "Can't propagate %s error from %s %s to %s %s", + str_ee_origin(ee), + pif_name(pif), + sockaddr_ntop(SO_EE_OFFENDER(ee), sastr, sizeof(sastr)), + pif_name(topif), + inany_ntop(&toside->eaddr, astr, sizeof(astr))); return 1; }
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
We've recently added support for propagating ICMP errors related to a UDP
flow from the host to the guest, by handling the extended UDP error on the
socket and synthesizing a suitable ICMP on the tap interface.
Currently we create that ICMP with a source address of the "offender" from
the extended error information - the source of the ICMP error received on
the host. However, we don't translate this address for cases where we NAT
between host and guest. This means (amongst other things) that we won't
get a "Connection refused" error as expected if send data from the guest to
the --map-host-loopback address. The error comes from 127.0.0.1 on the
host, which doesn't make sense on the tap interface and will be discarded
by the guest.
Because ICMP errors can be sent by an intermediate host, not just by the
endpoints of the flow, we can't handle this translation purely with the
information in the flow table entry. We need to explicitly translate this
address by our NAT rules, which we can do with the nat_inbound() helper.
Signed-off-by: David Gibson
On Wed, 16 Apr 2025 19:07:03 +1000
David Gibson
We now propagate ICMP errors on UDP flows back into ICMP packets on the tap interface. However, we don't always get the source address right for the synthesized message. Because ICMPs can be generated by intermediate routers, that source address might not be one of the endpoints, so the address translation we already have isn't sufficient.
Implement properly translating ICMP addresses when we need to. This ended up a bit messier than I hoped, but it seems to work. A simple case to test this is:
pasta --config-net --map-host-loopback=172.16.1.1 -- \ sh -c "echo hello | socat STDIO UDP4:172.16.1.1:10001"
where 10001 is a port where nothing is listening on the host.
Oh, that's convenient. I also checked this against the "bad resolver address" case I reported previously, everything "works": # nslookup passt.top 169.254.1.1 ;; communications error to 169.254.1.1#53: connection refused ;; communications error to 169.254.1.1#53: connection refused ;; communications error to 169.254.1.1#53: connection refused ;; no servers could be reached Except for those few comments to 2/4 and 3/4, everything else looks good to me. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio