[PATCH v5 00/15] Reduce differences between inbound and outbound socket binding

David Gibson

2 Dec 2025 2 Dec '25

5:02 a.m.

This series is based on my series fixing bug 176 (regression in auto forwarding). The fact that outbound forwarding sockets are bound to the loopback address, whereas inbound forwarding sockets are (by default) bound to the unspecified address leads to some unexpected differences between the paths setting up each of them. An idea for tackling bug 100 suggested a different approach which will also reduce some of those differences and allow more code to be shared between the two paths. I've since discovered that this approach doesn't help for bug 100, but I think it's still worthwhile for other reasons. v5: - Combine with SO_BINDTODEVICE and bug 113 patch series - Add fallback handling for kernels without SO_BINDTODEVICE - Add missing struct field documentation for no_bindtodevice v4: - Add cleanup patch removing unused structure field - Rebase, fixing conflicts with Laurent's changes - A bunch of spelling and other cosmetic fixes - Clarify relation to bug 100 and bug 113 v3: - A number of additional fixes covering the handling of IPV6_V6ONLY sockopt - Assorted trivial changes v2: - Some rearrangements and rewordings for clarity David Gibson (15): util: Correct error message on SO_BINDTODEVICE failure util: Extend sock_probe_mem() to sock_probe_features() conf: More useful errors for kernels without SO_BINDTODEVICE flow: Remove bogus @path field from flowside_sock_args inany: Let length of sockaddr_inany be implicit from the family util, flow, pif: Simplify sock_l4_sa() interface tcp: Merge tcp_ns_sock_init[46]() into tcp_sock_init_one() udp: Unify some more inbound/outbound parts of udp_sock_init() udp: Move udp_sock_init() special case to its caller util: Fix setting of IPV6_V6ONLY socket option tcp, udp: Remove fallback if creating dual stack socket fails tcp, udp: Bind outbound listening sockets by interface instead of address util: Rename sock_l4_dualstack() to sock_l4_dualstack_any() tcp: Always populate oaddr field for socket initiated flows fwd: Preserve non-standard loopback address when splice forwarding conf.c | 16 ++++- flow.c | 22 +++---- fwd.c | 4 +- icmp.c | 3 +- inany.h | 17 ++++++ passt.c | 2 +- passt.h | 2 + pif.c | 27 ++------- pif.h | 2 +- tcp.c | 162 +++++++++++++++++++++------------------------------ tcp.h | 5 +- tcp_splice.c | 5 +- udp.c | 122 +++++++++++++++++++++----------------- udp.h | 5 +- util.c | 102 +++++++++++++++++++++++++++----- util.h | 10 ++-- 16 files changed, 289 insertions(+), 217 deletions(-) -- 2.52.0

Show replies by date

David Gibson

Sorry, I realised the warning I add here isn't correct in a couple of ways. I'll respin shortly, but patches 1..11 should still be fine to review. On Tue, Dec 02, 2025 at 03:02:12PM +1100, David Gibson wrote:

...

Currently, outbound forwards (-T, -U) are handled by sockets bound to the loopback address. Typically we create two sockets, one for 127.0.0.1 and one for ::1.

This has some disadvantages: * The guest can't connect via 127.0.0.0/8 addresses other than 127.0.0.1 * We can't use dual-stack sockets, we have to have separate sockets for IPv4 and IPv6.

The restriction exists for a reason though. If the guest has any interfaces other than pasta (e.g. a VPN tunnel) external hosts could reach the host via the forwards. Especially combined with -T auto / -U auto this would make it very easy to make a mistake with nasty security implications.

We can achieve this a different way, however. Don't bind to a specific address, but _do_ use SO_BINDTODEVICE to restrict the sockets to the "lo" interface. We fall back to the old behaviour for older kernels where SO_BINDTODEVICE is not available unprivileged.

Note that although traffic to a local but non-loopback address is passed over the 'lo' interface (as seen by netfilter and dumpcap), it doesn't count as attached to that interface for the purposes of SO_BINDTODEVICE (information from the routing table overrides the "physical" interface). So, this change doesn't help for bug 100.

It's also not a complete fix for bug 113, it does however: * Get us a step closer to fixing bug 113 * Slightly simplify the code * Make things a bit easier to allow more flexible binding on the guest in in future

Link: https://bugs.passt.top/show_bug.cgi?id=113

Signed-off-by: David Gibson --- conf.c | 6 ++++++ pif.c | 6 ------ tcp.c | 5 +++++ udp.c | 30 +++++++++++++++++++++++------- 4 files changed, 34 insertions(+), 13 deletions(-)

diff --git a/conf.c b/conf.c index 6bd9717b..02a4b65a 100644 --- a/conf.c +++ b/conf.c @@ -235,6 +235,12 @@ static void conf_ports(const struct ctx *c, char optname, const char *optarg, if (c->mode != MODE_PASTA) die("'auto' port forwarding is only allowed for pasta");

+ if ((optname == 'T' || optname == 'U') && c->no_bindtodevice) { + warn( +"'-%c auto' enabled without unprivileged SO_BINDTODEVICE", optname); + warn( +"Forwarding from addresses other than 127.0.0.1 will not work"); + } fwd->mode = FWD_AUTO; return; } diff --git a/pif.c b/pif.c index 85904f35..db447b4f 100644 --- a/pif.c +++ b/pif.c @@ -81,12 +81,6 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif,

ASSERT(pif_is_socket(pif));

- if (pif == PIF_SPLICE) { - /* Sanity checks */ - ASSERT(!ifname); - ASSERT(addr && inany_is_loopback(addr)); - } - if (!addr) { ref.fd = sock_l4_dualstack(c, type, port, ifname); } else { diff --git a/tcp.c b/tcp.c index 2abb8be4..aacc5b20 100644 --- a/tcp.c +++ b/tcp.c @@ -2627,6 +2627,11 @@ static void tcp_ns_sock_init(const struct ctx *c, in_port_t port) { ASSERT(!c->no_tcp);

+ if (!c->no_bindtodevice) { + tcp_sock_init(c, PIF_SPLICE, NULL, "lo", port); + return; + } + if (c->ifi4) tcp_sock_init_one(c, PIF_SPLICE, &inany_loopback4, NULL, port); if (c->ifi6) diff --git a/udp.c b/udp.c index 3d097fbb..4b625b78 100644 --- a/udp.c +++ b/udp.c @@ -1182,6 +1182,26 @@ static void udp_splice_iov_init(void) } }

+/** + * udp_ns_sock_init() - Init socket to listen for spliced outbound connections + * @c: Execution context + * @port: Port, host order + */ +static void udp_ns_sock_init(const struct ctx *c, in_port_t port) +{ + ASSERT(!c->no_udp); + + if (!c->no_bindtodevice) { + udp_sock_init(c, PIF_SPLICE, NULL, "lo", port); + return; + } + + if (c->ifi4) + udp_sock_init(c, PIF_SPLICE, &inany_loopback4, NULL, port); + if (c->ifi6) + udp_sock_init(c, PIF_SPLICE, &inany_loopback6, NULL, port); +} + /** * udp_port_rebind() - Rebind ports to match forward maps * @c: Execution context @@ -1213,14 +1233,10 @@ static void udp_port_rebind(struct ctx *c, bool outbound)

if ((c->ifi4 && socks[V4][port] == -1) || (c->ifi6 && socks[V6][port] == -1)) { - if (outbound) { - udp_sock_init(c, PIF_SPLICE, - &inany_loopback4, NULL, port); - udp_sock_init(c, PIF_SPLICE, - &inany_loopback6, NULL, port); - } else { + if (outbound) + udp_ns_sock_init(c, port); + else udp_sock_init(c, PIF_HOST, NULL, NULL, port); - } } } } -- 2.52.0

-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson

Stefano Brivio

7:34 a.m.

New subject: [PATCH v5 02/15] util: Extend sock_probe_mem() to sock_probe_features()

On Tue, 2 Dec 2025 15:02:02 +1100 David Gibson wrote:

...

sock_probe_mem() currently checks whether we're able to allocate large socket buffers. Extend it to also check whether the SO_BINDTODEVICE socket option is available. Rename to sock_probe_features() to reflect the new functionality.

Signed-off-by: David Gibson --- passt.c | 2 +- passt.h | 2 ++ util.c | 19 +++++++++++++++++-- util.h | 2 +- 4 files changed, 21 insertions(+), 4 deletions(-)

diff --git a/passt.c b/passt.c index 4964427d..0b84ac6c 100644 --- a/passt.c +++ b/passt.c @@ -381,7 +381,7 @@ int main(int argc, char **argv) if (setrlimit(RLIMIT_NOFILE, &limit)) die_perror("Failed to set current limit for open files");

- sock_probe_mem(&c); + sock_probe_features(&c);

conf(&c, argc, argv); trace_init(c.trace); diff --git a/passt.h b/passt.h index 15801b44..79d01ddb 100644 --- a/passt.h +++ b/passt.h @@ -204,6 +204,7 @@ struct ip6_ctx { * @freebind: Allow binding of non-local addresses for forwarding * @low_wmem: Low probed net.core.wmem_max * @low_rmem: Low probed net.core.rmem_max + * @no_bindtodevice: Unprivileged SO_BINDTODEVICE not available * @vdev: vhost-user device * @device_state_fd: Device state migration channel * @device_state_result: Device state migration result @@ -281,6 +282,7 @@ struct ctx {

int low_wmem; int low_rmem; + int no_bindtodevice;

struct vu_dev *vdev;

diff --git a/util.c b/util.c index 347f34f5..bad38129 100644 --- a/util.c +++ b/util.c @@ -233,12 +233,13 @@ int sock_unix(char *sock_path) }

/** - * sock_probe_mem() - Check if setting high SO_SNDBUF and SO_RCVBUF is allowed + * sock_probe_features() - Probe for socket features we might use * @c: Execution context */ -void sock_probe_mem(struct ctx *c) +void sock_probe_features(struct ctx *c) { int v = INT_MAX / 2, s; + const char lo[] = "lo"; socklen_t sl;

s = socket(AF_INET, SOCK_STREAM | SOCK_CLOEXEC, IPPROTO_TCP); @@ -247,6 +248,7 @@ void sock_probe_mem(struct ctx *c) return; }

+ /* Check if setting high SO_SNDBUF and SO_RCVBUF is allowed */ sl = sizeof(v); if (setsockopt(s, SOL_SOCKET, SO_SNDBUF, &v, sizeof(v)) || getsockopt(s, SOL_SOCKET, SO_SNDBUF, &v, &sl) || @@ -259,6 +261,19 @@ void sock_probe_mem(struct ctx *c) (size_t)v < RCVBUF_BIG) c->low_rmem = 1;

+ /* Check if SO_BINDTODEVICE is available + * + * Supported since kernel version 5.7, commit c427bfec18f2 ("net: core: + * enable SO_BINDTODEVICE for non-root users"). Some distro kernels may + * have backports, of course. Record whether we can use it so that we + * can give more useful diagnostics. + */ + if (setsockopt(s, SOL_SOCKET, SO_BINDTODEVICE, lo, sizeof(lo)-1)) {

Nit: our coding style uses spaces around arithmetic operators... I took the liberty of fixing this up on merge, though. -- Stefano

Stefano Brivio

7:34 a.m.

On Tue, 2 Dec 2025 15:02:00 +1100 David Gibson wrote:

...

This series is based on my series fixing bug 176 (regression in auto forwarding).

The fact that outbound forwarding sockets are bound to the loopback address, whereas inbound forwarding sockets are (by default) bound to the unspecified address leads to some unexpected differences between the paths setting up each of them.

An idea for tackling bug 100 suggested a different approach which will also reduce some of those differences and allow more code to be shared between the two paths. I've since discovered that this approach doesn't help for bug 100, but I think it's still worthwhile for other reasons.

v5: - Combine with SO_BINDTODEVICE and bug 113 patch series - Add fallback handling for kernels without SO_BINDTODEVICE - Add missing struct field documentation for no_bindtodevice

Applied (with nit from 2/15 fixed). -- Stefano

Stefano Brivio

7:38 a.m.

New subject: [PATCH v5 12/15] tcp, udp: Bind outbound listening sockets by interface instead of address

On Wed, 3 Dec 2025 15:41:36 +1100 David Gibson wrote:

...

Sorry, I realised the warning I add here isn't correct in a couple of ways. I'll respin shortly, but patches 1..11 should still be fine to review.

...read just in time, I won't push the series (not even up to 11/15 I guess?). -- Stefano

Stefano Brivio

2:13 p.m.

New subject: [PATCH v5 12/15] tcp, udp: Bind outbound listening sockets by interface instead of address

On Wed, 3 Dec 2025 07:38:55 +0100 Stefano Brivio wrote:

...

On Wed, 3 Dec 2025 15:41:36 +1100 David Gibson wrote:

...
Sorry, I realised the warning I add here isn't correct in a couple of ways. I'll respin shortly, but patches 1..11 should still be fine to review.

...read just in time, I won't push the series (not even up to 11/15 I guess?).

Pushed now, as you noted offline that the series doesn't actually make the situation worse. -- Stefano

110

Age (days ago)

111

Last active (days ago)

List overview

Download

19 comments

2 participants

participants (2)

David Gibson
Stefano Brivio

[PATCH v5 00/15] Reduce differences between inbound and outbound socket binding

tags

participants (2)