On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:22 +1100 David Gibson
wrote: sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6 "dual stack" socket will listen both for IPv4 traffic and for IPv6 traffic actually using that v4-mapped address on the wire (presumably as a result of a router translating to a local IPv6-only network)? I think that will work, though I haven't tested. In that case we can determine that we need IPV6_V6ONLY from the address. The only case that doesn't cover is if we want to listen for v4-mapped traffic already translated by a router but *not* native IPv4 traffic. I don't see a lot of reason to ever do that, so it's in the "refactor if we ever discover we need it" pile. Otherwise, the only case in which a single dual stack socket actually listens to traffic from both protocols is for a wildcard. Maybe there are obscure wildcard addresses other than :: / 0.0.0.0, but that's also in the "worry about it later" pile. Note that: https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541... implies some sort of dual stack localhost support (it treats "dual stack" ::1 as listening on both ::1 and 127.0.0.1). However, AFAICT that's just not correct. On Linux, listening on ::1 listens only on ::1 even with V6ONLY explicitly set to 0.
We drop the '_sa' suffix while we're at it - it exists because this used to be an internal version with a sock_l4() wrapper. The wrapper no longer exists so the '_sa' is no longer useful.
Signed-off-by: David Gibson
--- flow.c | 6 ++---- pif.c | 10 +++------- util.c | 27 +++++++++++++++++++++++---- util.h | 8 +++++--- 4 files changed, 33 insertions(+), 18 deletions(-) diff --git a/flow.c b/flow.c index 9926f408..fd530ddb 100644 --- a/flow.c +++ b/flow.c @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
ns_enter(a->c);
- a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL, - a->sa->sa_family == AF_INET6, a->data); + a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data); a->err = errno;
return 0; @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, else if (sa.sa_family == AF_INET6) ifname = c->ip6.ifname_out;
- return sock_l4_sa(c, type, &sa, ifname, - sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data);
case PIF_SPLICE: { struct flowside_sock_args args = { diff --git a/pif.c b/pif.c index 31723b29..5fb1f455 100644 --- a/pif.c +++ b/pif.c @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, const union inany_addr *addr, const char *ifname, in_port_t port, uint32_t data) { - union sockaddr_inany sa = { - .sa6.sin6_family = AF_INET6, - .sa6.sin6_addr = in6addr_any, - .sa6.sin6_port = htons(port), - }; + union sockaddr_inany sa;
ASSERT(pif_is_socket(pif));
@@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, }
if (!addr) - return sock_l4_sa(c, type, &sa, ifname, false, data); + return sock_l4_dualstack(c, type, port, ifname, data);
pif_sockaddr(c, &sa, pif, addr, port); - return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data); } diff --git a/util.c b/util.c index 976fcabe..c94efae4 100644 --- a/util.c +++ b/util.c @@ -40,7 +40,7 @@ #endif
/** - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list + * sock_l4_() - Create and bind socket to socket address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to @@ -50,9 +50,9 @@ * * Return: newly created socket, negative error code on failure */ -int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data) +static int sock_l4_(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + bool v6only, uint32_t data) { sa_family_t af = sa->sa_family; union epoll_ref ref = { .type = type, .data = data }; @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type, return fd; }
+int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data)
Not extremely useful but it saves one "lookup":
/** * sock_l4() - Create and bind socket to given address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to * @ifname: Interface for binding, NULL for any * * Return: newly created socket, negative error code on failure */
Oops, I meant to go back and add function comments here, but I obviously forgot. Fixed. While there I removed the "add to epoll list" which is no longer correct.
+{ + return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data); +} + +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data)
...same here, and the comment might be used to clarify the functionality.
Done.
+{ + union sockaddr_inany sa = { + .sa6.sin6_family = AF_INET6, + .sa6.sin6_addr = in6addr_any, + .sa6.sin6_port = htons(port), + }; + + return sock_l4_(c, type, &sa, ifname, 0, data); +} + /** * sock_unix() - Create and bind AF_UNIX socket * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix) diff --git a/util.h b/util.h index e1a1ebc9..7f0cf686 100644 --- a/util.h +++ b/util.h @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, struct ctx; union sockaddr_inany;
-int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data); +int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data); +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data); int sock_unix(char *sock_path); void sock_probe_mem(struct ctx *c); long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson