On Fri, 17 Oct 2025 11:34:47 +1100
David Gibson
Currently, outbound forwards (-T, -U) are handled by sockets bound to the loopback address. Typically we create two sockets, one for 127.0.0.1 and one for ::1.
This has some disadvantages: * The guest can't connect to these services using its global IP address, it must explicitly use 127.0.0.1 or ::1 (bug 100) * The guest can't even connect via 127.0.0.0/8 addresses other than 127.0.0.1 * We can't use dual-stack sockets, we have to have separate sockets for IPv4 and IPv6.
The restriction exist for a reason though. If the guest has any interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach the host via the forwards. Especially combined with -T auto / -U auto this would make it very easy to make a mistake with nasty security implications.
We can achieve both goals, however, if we don't bind the outbound listening sockets to a particular address, but _do_ use SO_BINDTODEVICE to restrict them to the "lo" interface.
Nice trick, I didn't think of it. I wonder if doing the same host-side might help solving a part of https://bugs.passt.top/show_bug.cgi?id=113 as well.
Link: https://bugs.passt.top/show_bug.cgi?id=100
Signed-off-by: David Gibson
--- pif.c | 6 ------ tcp.c | 18 ++---------------- udp.c | 27 ++++++++++----------------- 3 files changed, 12 insertions(+), 39 deletions(-) diff --git a/pif.c b/pif.c index 592fafaa..84e3ceae 100644 --- a/pif.c +++ b/pif.c @@ -87,12 +87,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,
ASSERT(pif_is_socket(pif));
- if (pif == PIF_SPLICE) { - /* Sanity checks */ - ASSERT(!ifname); - ASSERT(addr && inany_is_loopback(addr)); - } - if (!addr) return sock_l4_sa(c, type, &sa, sizeof(sa.sa6), ifname, false, data); diff --git a/tcp.c b/tcp.c index 15c012d7..982c9190 100644 --- a/tcp.c +++ b/tcp.c @@ -2592,20 +2592,6 @@ int tcp_sock_init(const struct ctx *c, uint8_t pif,
return r4 < 0 ? r4 : r6; } -/** - * tcp_ns_sock_init() - Init socket to listen for spliced outbound connections - * @c: Execution context - * @port: Port, host order - */ -static void tcp_ns_sock_init(const struct ctx *c, in_port_t port) -{ - ASSERT(!c->no_tcp); - - if (c->ifi4) - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port); - if (c->ifi6) - tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback6, NULL, port); -}
/** * tcp_ns_socks_init() - Bind sockets in namespace for outbound connections @@ -2625,7 +2611,7 @@ static int tcp_ns_socks_init(void *arg) if (!bitmap_isset(c->tcp.fwd_out.map, port)) continue;
- tcp_ns_sock_init(c, port); + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
I thought the "lo" string would be part of the Linux UAPI, but that's not the case, and loopback_net_init() just calls: alloc_netdev(0, "lo", NET_NAME_PREDICTABLE, loopback_setup); so I think it's relatively unproblematic to hardcode that as well, and it looks like we can't create a second loopback interface, even though: $ pasta -- sh -c 'ip link set dev lo down; ip link change dev lo name lol; ip link show lol' 1: lol: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 I don't have any quick solution and I don't think we care enough as to write a function in netlink.c fetching links with loopback type, so I'm totally fine with this as it is. By the way, if we fail to use SO_BINDTODEVICE, we already defensively close the socket. The only possible flaw that occurs to me is that somebody could rename 'lo' and then create a link called 'lo' of a different type. But that needs CAP_NET_ADMIN in the container anyway.
}
return 0; @@ -2805,7 +2791,7 @@ static void tcp_port_rebind(struct ctx *c, bool outbound) if ((c->ifi4 && socks[port][V4] == -1) || (c->ifi6 && socks[port][V6] == -1)) { if (outbound) - tcp_ns_sock_init(c, port); + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port);
Should we have/keep a fallback for pre-5.7 / pre-c427bfec18f2 kernels?
else tcp_sock_init(c, PIF_HOST, NULL, NULL, port); } diff --git a/udp.c b/udp.c index 49dd0144..e38114eb 100644 --- a/udp.c +++ b/udp.c @@ -1127,26 +1127,16 @@ int udp_sock_init(const struct ctx *c, uint8_t pif, }
if ((!addr || inany_v4(addr)) && c->ifi4) { - const union inany_addr *a = addr ? - addr : &inany_any4; - - if (pif == PIF_SPLICE) - a = &inany_loopback4; - - r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, + r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, + addr ? addr : &inany_any4, ifname, port, uref.u32);
socks[V4][port] = r4 < 0 ? -1 : r4; }
if ((!addr || !inany_v4(addr)) && c->ifi6) { - const union inany_addr *a = addr ? - addr : &inany_any6; - - if (pif == PIF_SPLICE) - a = &inany_loopback6; - - r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, + r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, + addr ? addr : &inany_any6, ifname, port, uref.u32);
socks[V6][port] = r6 < 0 ? -1 : r6; @@ -1214,9 +1204,12 @@ static void udp_port_rebind(struct ctx *c, bool outbound) continue;
if ((c->ifi4 && socks[V4][port] == -1) || - (c->ifi6 && socks[V6][port] == -1)) - udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST, - NULL, NULL, port); + (c->ifi6 && socks[V6][port] == -1)) { + if (outbound) + udp_sock_init(c, PIF_SPLICE, NULL, "lo", port); + else + udp_sock_init(c, PIF_HOST, NULL, NULL, port);
Same here, should we add a fallback case? The rest of the series looks good to me.
+ } } }
-- Stefano