[PATCH v2 0/3] Reduce differences between inbound and outbound socket binding
The fact that outbound forwarding sockets are bound to the loopback address, whereas inbound forwarding sockets are (by default) bound to the unspecified address leads to some unexpected differences between the paths setting up each of them. Happily there's an approach to tackling bug 100 which also reduces those differences, allowing more code to be shared between the two paths. Amongst other things, this will make the next steps of flexible forwarding configuration easier. Link: https://bugs.passt.top/show_bug.cgi?id=100 v2: - Some rearrangements and rewordings for clarity David Gibson (3): tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() udp: Unify some more inbound/outbound parts of udp_sock_init() tcp, udp: Bind outbound listening sockets by interface instead of address conf.c | 4 +- pif.c | 6 --- tcp.c | 114 +++++++++++++++++---------------------------------------- tcp.h | 5 ++- udp.c | 60 +++++++++++++++--------------- udp.h | 5 ++- 6 files changed, 70 insertions(+), 124 deletions(-) -- 2.51.0
udp_sock_init() takes an 'ns' parameter determining if it creates a socket
in the guest namespace or host namespace. Alter it to take a pif
parameter instead, like tcp_sock_init(), and use that change to slightly
reduce code duplication between the HOST and SPLICE cases.
Signed-off-by: David Gibson
Surprisingly little logic is shared between the path for creating a
listen()ing socket in the guest namespace versus in the host namespace.
Improve this, by extending tcp_sock_init_one() to take a pif parameter
indicating where it should open the socket. This allows
tcp_ns_sock_init[46]() to be removed entirely.
We generalise tcp_sock_init() in the same way, although we don't use it
yet, due to some subtle differences in how we bind for -t versus -T.
Signed-off-by: David Gibson
Currently, outbound forwards (-T, -U) are handled by sockets bound to the
loopback address. Typically we create two sockets, one for 127.0.0.1 and
one for ::1.
This has some disadvantages:
* The guest can't connect to these services using its global IP address,
it must explicitly use 127.0.0.1 or ::1 (bug 100)
* The guest can't even connect via 127.0.0.0/8 addresses other than
127.0.0.1
* We can't use dual-stack sockets, we have to have separate sockets for
IPv4 and IPv6.
The restriction exist for a reason though. If the guest has any interfaces
other than pasta (e.g. a VPN tunnel) external hosts could reach the host
via the forwards. Especially combined with -T auto / -U auto this would
make it very easy to make a mistake with nasty security implications.
We can achieve both goals, however, if we don't bind the outbound listening
sockets to a particular address, but _do_ use SO_BINDTODEVICE to restrict
them to the "lo" interface.
Link: https://bugs.passt.top/show_bug.cgi?id=100
Signed-off-by: David Gibson
Sorry for the delay but...
On Wed, 22 Oct 2025 14:15:26 +1100
David Gibson
udp_sock_init() takes an 'ns' parameter determining if it creates a socket in the guest namespace or host namespace. Alter it to take a pif parameter instead, like tcp_sock_init(), and use that change to slightly reduce code duplication between the HOST and SPLICE cases.
Signed-off-by: David Gibson
--- conf.c | 2 +- udp.c | 65 +++++++++++++++++++++++++++++++--------------------------- udp.h | 5 +++-- 3 files changed, 39 insertions(+), 33 deletions(-) diff --git a/conf.c b/conf.c index 26f1bcc0..08cb50aa 100644 --- a/conf.c +++ b/conf.c @@ -171,7 +171,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname, if (optname == 't') ret = tcp_sock_init(c, PIF_HOST, addr, ifname, i); else if (optname == 'u') - ret = udp_sock_init(c, 0, addr, ifname, i); + ret = udp_sock_init(c, PIF_HOST, addr, ifname, i); else /* No way to check in advance for -T and -U */ ret = 0; diff --git a/udp.c b/udp.c index 86585b7e..7f5faf20 100644 --- a/udp.c +++ b/udp.c @@ -1093,64 +1093,68 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, /** * udp_sock_init() - Initialise listening sockets for a given port * @c: Execution context - * @ns: In pasta mode, if set, bind with loopback address in namespace + * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE) * @addr: Pointer to address for binding, NULL if not configured * @ifname: Name of interface to bind to, NULL if not configured * @port: Port, host order * * Return: 0 on (partial) success, negative error code on (complete) failure */ -int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr, - const char *ifname, in_port_t port) +int udp_sock_init(const struct ctx *c, uint8_t pif, + const union inany_addr *addr, const char *ifname, + in_port_t port) { union udp_listen_epoll_ref uref = { - .pif = ns ? PIF_SPLICE : PIF_HOST, + .pif = pif, .port = port, }; int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1; + int (*socks)[NUM_PORTS];
ASSERT(!c->no_udp); + ASSERT(pif_is_socket(pif))
...this doesn't build. If I add the missing semicolon everything is fine. Should I? Worth double checking on your side? The series looks otherwise good to me.
- if (!addr && c->ifi4 && c->ifi6 && !ns) { + if (pif == PIF_HOST) + socks = udp_splice_init; + else + socks = udp_splice_ns; + + if (!addr && c->ifi4 && c->ifi6 && pif == PIF_HOST) { int s;
/* Attempt to get a dual stack socket */ s = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST, NULL, ifname, port, uref.u32); - udp_splice_init[V4][port] = s < 0 ? -1 : s; - udp_splice_init[V6][port] = s < 0 ? -1 : s; + socks[V4][port] = s < 0 ? -1 : s; + socks[V6][port] = s < 0 ? -1 : s; if (IN_INTERVAL(0, FD_REF_MAX, s)) return 0; }
if ((!addr || inany_v4(addr)) && c->ifi4) { - if (!ns) { - r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST, - addr ? addr : &inany_any4, ifname, - port, uref.u32); + const union inany_addr *a = addr ? + addr : &inany_any4;
- udp_splice_init[V4][port] = r4 < 0 ? -1 : r4; - } else { - r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_SPLICE, - &inany_loopback4, ifname, - port, uref.u32); - udp_splice_ns[V4][port] = r4 < 0 ? -1 : r4; - } + if (pif == PIF_SPLICE) + a = &inany_loopback4; + + r4 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, + port, uref.u32); + + socks[V4][port] = r4 < 0 ? -1 : r4; }
if ((!addr || !inany_v4(addr)) && c->ifi6) { - if (!ns) { - r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_HOST, - addr ? addr : &inany_any6, ifname, - port, uref.u32); + const union inany_addr *a = addr ? + addr : &inany_any6;
- udp_splice_init[V6][port] = r6 < 0 ? -1 : r6; - } else { - r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, PIF_SPLICE, - &inany_loopback6, ifname, - port, uref.u32); - udp_splice_ns[V6][port] = r6 < 0 ? -1 : r6; - } + if (pif == PIF_SPLICE) + a = &inany_loopback6; + + r6 = pif_sock_l4(c, EPOLL_TYPE_UDP_LISTEN, pif, a, ifname, + port, uref.u32); + + socks[V6][port] = r6 < 0 ? -1 : r6; }
if (IN_INTERVAL(0, FD_REF_MAX, r4) || IN_INTERVAL(0, FD_REF_MAX, r6)) @@ -1216,7 +1220,8 @@ static void udp_port_rebind(struct ctx *c, bool outbound)
if ((c->ifi4 && socks[V4][port] == -1) || (c->ifi6 && socks[V6][port] == -1)) - udp_sock_init(c, outbound, NULL, NULL, port); + udp_sock_init(c, outbound ? PIF_SPLICE : PIF_HOST, + NULL, NULL, port); } }
diff --git a/udp.h b/udp.h index 8f8531ad..f78dc528 100644 --- a/udp.h +++ b/udp.h @@ -17,8 +17,9 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, sa_family_t af, const void *saddr, const void *daddr, uint8_t ttl, const struct pool *p, int idx, const struct timespec *now); -int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr, - const char *ifname, in_port_t port); +int udp_sock_init(const struct ctx *c, uint8_t pif, + const union inany_addr *addr, const char *ifname, + in_port_t port); int udp_init(struct ctx *c); void udp_timer(struct ctx *c, const struct timespec *now); void udp_update_l2_buf(const unsigned char *eth_d, const unsigned char *eth_s);
-- Stefano
On Wed, Oct 29, 2025 at 12:13:38AM +0100, Stefano Brivio wrote:
Sorry for the delay but...
On Wed, 22 Oct 2025 14:15:26 +1100 David Gibson
wrote: udp_sock_init() takes an 'ns' parameter determining if it creates a socket in the guest namespace or host namespace. Alter it to take a pif parameter instead, like tcp_sock_init(), and use that change to slightly reduce code duplication between the HOST and SPLICE cases.
Signed-off-by: David Gibson
--- conf.c | 2 +- udp.c | 65 +++++++++++++++++++++++++++++++--------------------------- udp.h | 5 +++-- 3 files changed, 39 insertions(+), 33 deletions(-) diff --git a/conf.c b/conf.c index 26f1bcc0..08cb50aa 100644 --- a/conf.c +++ b/conf.c @@ -171,7 +171,7 @@ static void conf_ports_range_except(const struct ctx *c, char optname, if (optname == 't') ret = tcp_sock_init(c, PIF_HOST, addr, ifname, i); else if (optname == 'u') - ret = udp_sock_init(c, 0, addr, ifname, i); + ret = udp_sock_init(c, PIF_HOST, addr, ifname, i); else /* No way to check in advance for -T and -U */ ret = 0; diff --git a/udp.c b/udp.c index 86585b7e..7f5faf20 100644 --- a/udp.c +++ b/udp.c @@ -1093,64 +1093,68 @@ int udp_tap_handler(const struct ctx *c, uint8_t pif, /** * udp_sock_init() - Initialise listening sockets for a given port * @c: Execution context - * @ns: In pasta mode, if set, bind with loopback address in namespace + * @pif: Interface to open the socket for (PIF_HOST or PIF_SPLICE) * @addr: Pointer to address for binding, NULL if not configured * @ifname: Name of interface to bind to, NULL if not configured * @port: Port, host order * * Return: 0 on (partial) success, negative error code on (complete) failure */ -int udp_sock_init(const struct ctx *c, int ns, const union inany_addr *addr, - const char *ifname, in_port_t port) +int udp_sock_init(const struct ctx *c, uint8_t pif, + const union inany_addr *addr, const char *ifname, + in_port_t port) { union udp_listen_epoll_ref uref = { - .pif = ns ? PIF_SPLICE : PIF_HOST, + .pif = pif, .port = port, }; int r4 = FD_REF_MAX + 1, r6 = FD_REF_MAX + 1; + int (*socks)[NUM_PORTS];
ASSERT(!c->no_udp); + ASSERT(pif_is_socket(pif))
...this doesn't build. If I add the missing semicolon everything is fine. Should I? Worth double checking on your side?
Oh. Wow. Don't know how I missed that. I'll fix it. I need to respin to add the fallback for kernels without BINDTODEVICE anyway. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Stefano Brivio