[PATCH v3 2/8] util, flow, pif: Simplify sock_l4_sa() interface
sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether
to set the IPV6_V6ONLY socket option. Usually it's set when the given
address is IPv6, but not when we want to create a dual stack listening
socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper
sock_l4_(). External users will call either sock_l4() which always creates
a socket bound to a specific IP version, or sock_l4_dualstack() which
creates a dual stack socket, but takes only a port not an address.
We drop the '_sa' suffix while we're at it - it exists because this used
to be an internal version with a sock_l4() wrapper. The wrapper no longer
exists so the '_sa' is no longer useful.
Signed-off-by: David Gibson
On Wed, 29 Oct 2025 17:26:22 +1100
David Gibson
sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
We drop the '_sa' suffix while we're at it - it exists because this used to be an internal version with a sock_l4() wrapper. The wrapper no longer exists so the '_sa' is no longer useful.
Signed-off-by: David Gibson
--- flow.c | 6 ++---- pif.c | 10 +++------- util.c | 27 +++++++++++++++++++++++---- util.h | 8 +++++--- 4 files changed, 33 insertions(+), 18 deletions(-) diff --git a/flow.c b/flow.c index 9926f408..fd530ddb 100644 --- a/flow.c +++ b/flow.c @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
ns_enter(a->c);
- a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL, - a->sa->sa_family == AF_INET6, a->data); + a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data); a->err = errno;
return 0; @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, else if (sa.sa_family == AF_INET6) ifname = c->ip6.ifname_out;
- return sock_l4_sa(c, type, &sa, ifname, - sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data);
case PIF_SPLICE: { struct flowside_sock_args args = { diff --git a/pif.c b/pif.c index 31723b29..5fb1f455 100644 --- a/pif.c +++ b/pif.c @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, const union inany_addr *addr, const char *ifname, in_port_t port, uint32_t data) { - union sockaddr_inany sa = { - .sa6.sin6_family = AF_INET6, - .sa6.sin6_addr = in6addr_any, - .sa6.sin6_port = htons(port), - }; + union sockaddr_inany sa;
ASSERT(pif_is_socket(pif));
@@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, }
if (!addr) - return sock_l4_sa(c, type, &sa, ifname, false, data); + return sock_l4_dualstack(c, type, port, ifname, data);
pif_sockaddr(c, &sa, pif, addr, port); - return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data); } diff --git a/util.c b/util.c index 976fcabe..c94efae4 100644 --- a/util.c +++ b/util.c @@ -40,7 +40,7 @@ #endif
/** - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list + * sock_l4_() - Create and bind socket to socket address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to @@ -50,9 +50,9 @@ * * Return: newly created socket, negative error code on failure */ -int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data) +static int sock_l4_(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + bool v6only, uint32_t data) { sa_family_t af = sa->sa_family; union epoll_ref ref = { .type = type, .data = data }; @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type, return fd; }
+int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data)
Not extremely useful but it saves one "lookup": /** * sock_l4() - Create and bind socket to given address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to * @ifname: Interface for binding, NULL for any * * Return: newly created socket, negative error code on failure */
+{ + return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data); +} + +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data)
...same here, and the comment might be used to clarify the functionality.
+{ + union sockaddr_inany sa = { + .sa6.sin6_family = AF_INET6, + .sa6.sin6_addr = in6addr_any, + .sa6.sin6_port = htons(port), + }; + + return sock_l4_(c, type, &sa, ifname, 0, data); +} + /** * sock_unix() - Create and bind AF_UNIX socket * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix) diff --git a/util.h b/util.h index e1a1ebc9..7f0cf686 100644 --- a/util.h +++ b/util.h @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, struct ctx; union sockaddr_inany;
-int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data); +int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data); +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data); int sock_unix(char *sock_path); void sock_probe_mem(struct ctx *c); long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
-- Stefano
On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:22 +1100 David Gibson
wrote: sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6 "dual stack" socket will listen both for IPv4 traffic and for IPv6 traffic actually using that v4-mapped address on the wire (presumably as a result of a router translating to a local IPv6-only network)? I think that will work, though I haven't tested. In that case we can determine that we need IPV6_V6ONLY from the address. The only case that doesn't cover is if we want to listen for v4-mapped traffic already translated by a router but *not* native IPv4 traffic. I don't see a lot of reason to ever do that, so it's in the "refactor if we ever discover we need it" pile. Otherwise, the only case in which a single dual stack socket actually listens to traffic from both protocols is for a wildcard. Maybe there are obscure wildcard addresses other than :: / 0.0.0.0, but that's also in the "worry about it later" pile. Note that: https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541... implies some sort of dual stack localhost support (it treats "dual stack" ::1 as listening on both ::1 and 127.0.0.1). However, AFAICT that's just not correct. On Linux, listening on ::1 listens only on ::1 even with V6ONLY explicitly set to 0.
We drop the '_sa' suffix while we're at it - it exists because this used to be an internal version with a sock_l4() wrapper. The wrapper no longer exists so the '_sa' is no longer useful.
Signed-off-by: David Gibson
--- flow.c | 6 ++---- pif.c | 10 +++------- util.c | 27 +++++++++++++++++++++++---- util.h | 8 +++++--- 4 files changed, 33 insertions(+), 18 deletions(-) diff --git a/flow.c b/flow.c index 9926f408..fd530ddb 100644 --- a/flow.c +++ b/flow.c @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
ns_enter(a->c);
- a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL, - a->sa->sa_family == AF_INET6, a->data); + a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data); a->err = errno;
return 0; @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, else if (sa.sa_family == AF_INET6) ifname = c->ip6.ifname_out;
- return sock_l4_sa(c, type, &sa, ifname, - sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data);
case PIF_SPLICE: { struct flowside_sock_args args = { diff --git a/pif.c b/pif.c index 31723b29..5fb1f455 100644 --- a/pif.c +++ b/pif.c @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, const union inany_addr *addr, const char *ifname, in_port_t port, uint32_t data) { - union sockaddr_inany sa = { - .sa6.sin6_family = AF_INET6, - .sa6.sin6_addr = in6addr_any, - .sa6.sin6_port = htons(port), - }; + union sockaddr_inany sa;
ASSERT(pif_is_socket(pif));
@@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, }
if (!addr) - return sock_l4_sa(c, type, &sa, ifname, false, data); + return sock_l4_dualstack(c, type, port, ifname, data);
pif_sockaddr(c, &sa, pif, addr, port); - return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data); } diff --git a/util.c b/util.c index 976fcabe..c94efae4 100644 --- a/util.c +++ b/util.c @@ -40,7 +40,7 @@ #endif
/** - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list + * sock_l4_() - Create and bind socket to socket address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to @@ -50,9 +50,9 @@ * * Return: newly created socket, negative error code on failure */ -int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data) +static int sock_l4_(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + bool v6only, uint32_t data) { sa_family_t af = sa->sa_family; union epoll_ref ref = { .type = type, .data = data }; @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type, return fd; }
+int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data)
Not extremely useful but it saves one "lookup":
/** * sock_l4() - Create and bind socket to given address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to * @ifname: Interface for binding, NULL for any * * Return: newly created socket, negative error code on failure */
Oops, I meant to go back and add function comments here, but I obviously forgot. Fixed. While there I removed the "add to epoll list" which is no longer correct.
+{ + return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data); +} + +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data)
...same here, and the comment might be used to clarify the functionality.
Done.
+{ + union sockaddr_inany sa = { + .sa6.sin6_family = AF_INET6, + .sa6.sin6_addr = in6addr_any, + .sa6.sin6_port = htons(port), + }; + + return sock_l4_(c, type, &sa, ifname, 0, data); +} + /** * sock_unix() - Create and bind AF_UNIX socket * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix) diff --git a/util.h b/util.h index e1a1ebc9..7f0cf686 100644 --- a/util.h +++ b/util.h @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, struct ctx; union sockaddr_inany;
-int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data); +int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data); +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data); int sock_unix(char *sock_path); void sock_probe_mem(struct ctx *c); long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
-- Stefano
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, 14 Nov 2025 10:21:46 +1100
David Gibson
On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:22 +1100 David Gibson
wrote: sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6 "dual stack" socket will listen both for IPv4 traffic and for IPv6 traffic actually using that v4-mapped address on the wire (presumably as a result of a router translating to a local IPv6-only network)? I think that will work, though I haven't tested.
Yes, that's what I meant.
In that case we can determine that we need IPV6_V6ONLY from the address. The only case that doesn't cover is if we want to listen for v4-mapped traffic already translated by a router but *not* native IPv4 traffic. I don't see a lot of reason to ever do that, so it's in the "refactor if we ever discover we need it" pile.
I thought that we might want to listen on both IP versions for whatever reason, on a single socket, with a specific address (say, that v4-mapped address and the equivalent untranslated address...?). I know it can't be done now anyway, I'm just saying that sock_l4_dualstack() forcing wildcard addresses isn't something we should imply as part of "dualstack".
Otherwise, the only case in which a single dual stack socket actually listens to traffic from both protocols is for a wildcard. Maybe there are obscure wildcard addresses other than :: / 0.0.0.0, but that's also in the "worry about it later" pile.
Sure.
Note that:
https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541...
implies some sort of dual stack localhost support (it treats "dual stack" ::1 as listening on both ::1 and 127.0.0.1). However, AFAICT that's just not correct. On Linux, listening on ::1 listens only on ::1 even with V6ONLY explicitly set to 0.
Right, I don't even know what "simulated" means there. Actually there's no problem description at all. Go figure. I'm not sure if we want to report something (I'm not even sure what we should report).
We drop the '_sa' suffix while we're at it - it exists because this used to be an internal version with a sock_l4() wrapper. The wrapper no longer exists so the '_sa' is no longer useful.
Signed-off-by: David Gibson
--- flow.c | 6 ++---- pif.c | 10 +++------- util.c | 27 +++++++++++++++++++++++---- util.h | 8 +++++--- 4 files changed, 33 insertions(+), 18 deletions(-) diff --git a/flow.c b/flow.c index 9926f408..fd530ddb 100644 --- a/flow.c +++ b/flow.c @@ -186,8 +186,7 @@ static int flowside_sock_splice(void *arg)
ns_enter(a->c);
- a->fd = sock_l4_sa(a->c, a->type, a->sa, NULL, - a->sa->sa_family == AF_INET6, a->data); + a->fd = sock_l4(a->c, a->type, a->sa, NULL, a->data); a->err = errno;
return 0; @@ -222,8 +221,7 @@ int flowside_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, else if (sa.sa_family == AF_INET6) ifname = c->ip6.ifname_out;
- return sock_l4_sa(c, type, &sa, ifname, - sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data);
case PIF_SPLICE: { struct flowside_sock_args args = { diff --git a/pif.c b/pif.c index 31723b29..5fb1f455 100644 --- a/pif.c +++ b/pif.c @@ -75,11 +75,7 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, const union inany_addr *addr, const char *ifname, in_port_t port, uint32_t data) { - union sockaddr_inany sa = { - .sa6.sin6_family = AF_INET6, - .sa6.sin6_addr = in6addr_any, - .sa6.sin6_port = htons(port), - }; + union sockaddr_inany sa;
ASSERT(pif_is_socket(pif));
@@ -90,8 +86,8 @@ int pif_sock_l4(const struct ctx *c, enum epoll_type type, uint8_t pif, }
if (!addr) - return sock_l4_sa(c, type, &sa, ifname, false, data); + return sock_l4_dualstack(c, type, port, ifname, data);
pif_sockaddr(c, &sa, pif, addr, port); - return sock_l4_sa(c, type, &sa, ifname, sa.sa_family == AF_INET6, data); + return sock_l4(c, type, &sa, ifname, data); } diff --git a/util.c b/util.c index 976fcabe..c94efae4 100644 --- a/util.c +++ b/util.c @@ -40,7 +40,7 @@ #endif
/** - * sock_l4_sa() - Create and bind socket to socket address, add to epoll list + * sock_l4_() - Create and bind socket to socket address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to @@ -50,9 +50,9 @@ * * Return: newly created socket, negative error code on failure */ -int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data) +static int sock_l4_(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + bool v6only, uint32_t data) { sa_family_t af = sa->sa_family; union epoll_ref ref = { .type = type, .data = data }; @@ -182,6 +182,25 @@ int sock_l4_sa(const struct ctx *c, enum epoll_type type, return fd; }
+int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data)
Not extremely useful but it saves one "lookup":
/** * sock_l4() - Create and bind socket to given address, add to epoll list * @c: Execution context * @type: epoll type * @sa: Socket address to bind to * @ifname: Interface for binding, NULL for any * * Return: newly created socket, negative error code on failure */
Oops, I meant to go back and add function comments here, but I obviously forgot. Fixed.
While there I removed the "add to epoll list" which is no longer correct.
Oops, I hadn't solved the merge conflict yet...
+{ + return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data); +} + +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data)
...same here, and the comment might be used to clarify the functionality.
Done.
+{ + union sockaddr_inany sa = { + .sa6.sin6_family = AF_INET6, + .sa6.sin6_addr = in6addr_any, + .sa6.sin6_port = htons(port), + }; + + return sock_l4_(c, type, &sa, ifname, 0, data); +} + /** * sock_unix() - Create and bind AF_UNIX socket * @sock_path: Socket path. If empty, set on return (UNIX_SOCK_PATH as prefix) diff --git a/util.h b/util.h index e1a1ebc9..7f0cf686 100644 --- a/util.h +++ b/util.h @@ -203,9 +203,11 @@ int do_clone(int (*fn)(void *), char *stack_area, size_t stack_size, int flags, struct ctx; union sockaddr_inany;
-int sock_l4_sa(const struct ctx *c, enum epoll_type type, - const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data); +int sock_l4(const struct ctx *c, enum epoll_type type, + const union sockaddr_inany *sa, const char *ifname, + uint32_t data); +int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, + in_port_t port, const char *ifname, uint32_t data); int sock_unix(char *sock_path); void sock_probe_mem(struct ctx *c); long timespec_diff_ms(const struct timespec *a, const struct timespec *b);
-- Stefano
On Tue, Nov 18, 2025 at 01:19:21AM +0100, Stefano Brivio wrote:
On Fri, 14 Nov 2025 10:21:46 +1100 David Gibson
wrote: On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:22 +1100 David Gibson
wrote: sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6 "dual stack" socket will listen both for IPv4 traffic and for IPv6 traffic actually using that v4-mapped address on the wire (presumably as a result of a router translating to a local IPv6-only network)? I think that will work, though I haven't tested.
Yes, that's what I meant.
In that case we can determine that we need IPV6_V6ONLY from the address. The only case that doesn't cover is if we want to listen for v4-mapped traffic already translated by a router but *not* native IPv4 traffic. I don't see a lot of reason to ever do that, so it's in the "refactor if we ever discover we need it" pile.
I thought that we might want to listen on both IP versions for whatever reason, on a single socket, with a specific address (say, that v4-mapped address and the equivalent untranslated address...?).
I'm not really sure what you mean by an "equivalent untranslated address". AFAIK, the only non-wildcard case that will actually listen on both IP versions is a v4-mapped address. So, yes we probably should explicitly set IPV6_V6ONLY==0 for v4-mapped addresses as well.
I know it can't be done now anyway, I'm just saying that sock_l4_dualstack() forcing wildcard addresses isn't something we should imply as part of "dualstack".
Hm, ok. What if I renamed it to sock_l4_dualwild()?
Otherwise, the only case in which a single dual stack socket actually listens to traffic from both protocols is for a wildcard. Maybe there are obscure wildcard addresses other than :: / 0.0.0.0, but that's also in the "worry about it later" pile.
Sure.
Note that:
https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541...
implies some sort of dual stack localhost support (it treats "dual stack" ::1 as listening on both ::1 and 127.0.0.1). However, AFAICT that's just not correct. On Linux, listening on ::1 listens only on ::1 even with V6ONLY explicitly set to 0.
Right, I don't even know what "simulated" means there. Actually there's no problem description at all. Go figure. I'm not sure if we want to report something (I'm not even sure what we should report).
I think "simulated" there means using one v4 and one v6 socket instead of a dual stack socket. Looks like that patch came in response to https://github.com/containers/podman/issues/12292 -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Tue, 18 Nov 2025 14:34:58 +1100
David Gibson
On Tue, Nov 18, 2025 at 01:19:21AM +0100, Stefano Brivio wrote:
On Fri, 14 Nov 2025 10:21:46 +1100 David Gibson
wrote: On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:22 +1100 David Gibson
wrote: sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6 "dual stack" socket will listen both for IPv4 traffic and for IPv6 traffic actually using that v4-mapped address on the wire (presumably as a result of a router translating to a local IPv6-only network)? I think that will work, though I haven't tested.
Yes, that's what I meant.
In that case we can determine that we need IPV6_V6ONLY from the address. The only case that doesn't cover is if we want to listen for v4-mapped traffic already translated by a router but *not* native IPv4 traffic. I don't see a lot of reason to ever do that, so it's in the "refactor if we ever discover we need it" pile.
I thought that we might want to listen on both IP versions for whatever reason, on a single socket, with a specific address (say, that v4-mapped address and the equivalent untranslated address...?).
I'm not really sure what you mean by an "equivalent untranslated address". AFAIK, the only non-wildcard case that will actually listen on both IP versions is a v4-mapped address.
I mean 192.0.2.1 (untranslated, IPv4) and ::ffff:192.0.2.1 (v4-mapped). Will we ever want to listen to both? I don't think we have to care about that right now, though.
So, yes we probably should explicitly set IPV6_V6ONLY==0 for v4-mapped addresses as well.
I know it can't be done now anyway, I'm just saying that sock_l4_dualstack() forcing wildcard addresses isn't something we should imply as part of "dualstack".
Hm, ok. What if I renamed it to sock_l4_dualwild()?
Short-hands for "wildcard" aren't necessarily obvious. I would have gone with "dual_any" or "dualstack_any" or "v4v6_any" or "inany_any". But actually it can also stay like that, I guess, especially as this looks refactor-prone for https://bugs.passt.top/show_bug.cgi?id=140. I just wanted to raise the fact it's not obvious that "dualstack" implies :: and 0.0.0.0. It doesn't need to be addressed in code or comments now, or ever.
Otherwise, the only case in which a single dual stack socket actually listens to traffic from both protocols is for a wildcard. Maybe there are obscure wildcard addresses other than :: / 0.0.0.0, but that's also in the "worry about it later" pile.
Sure.
Note that:
https://github.com/containers/podman/pull/14026/commits/772ead25318dfa340541...
implies some sort of dual stack localhost support (it treats "dual stack" ::1 as listening on both ::1 and 127.0.0.1). However, AFAICT that's just not correct. On Linux, listening on ::1 listens only on ::1 even with V6ONLY explicitly set to 0.
Right, I don't even know what "simulated" means there. Actually there's no problem description at all. Go figure. I'm not sure if we want to report something (I'm not even sure what we should report).
I think "simulated" there means using one v4 and one v6 socket instead of a dual stack socket.
Looks like that patch came in response to https://github.com/containers/podman/issues/12292
-- Stefano
On Wed, Nov 19, 2025 at 12:42:04PM +0100, Stefano Brivio wrote:
On Tue, 18 Nov 2025 14:34:58 +1100 David Gibson
wrote: On Tue, Nov 18, 2025 at 01:19:21AM +0100, Stefano Brivio wrote:
On Fri, 14 Nov 2025 10:21:46 +1100 David Gibson
wrote: On Thu, Nov 13, 2025 at 07:33:13AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:22 +1100 David Gibson
wrote: sock_l4_sa() has a somewhat confusing 'v6only' option controlling whether to set the IPV6_V6ONLY socket option. Usually it's set when the given address is IPv6, but not when we want to create a dual stack listening socket. The latter only makes sense when the address is :: however.
Clarify this by only keeping the v6only option in an internal helper sock_l4_(). External users will call either sock_l4() which always creates a socket bound to a specific IP version, or sock_l4_dualstack() which creates a dual stack socket, but takes only a port not an address.
I'm not sure if we'll ever need anything different, but I guess that this is not the only obvious semantic of sock_l4_dualstack(), as it could take a sockaddr_inany eventually, and bind() IPv6 address and its v4-mapped equivalent (...does that even work?).
Do you mean that if we have a v4-mapped address, then using an IPv6 "dual stack" socket will listen both for IPv4 traffic and for IPv6 traffic actually using that v4-mapped address on the wire (presumably as a result of a router translating to a local IPv6-only network)? I think that will work, though I haven't tested.
Yes, that's what I meant.
In that case we can determine that we need IPV6_V6ONLY from the address. The only case that doesn't cover is if we want to listen for v4-mapped traffic already translated by a router but *not* native IPv4 traffic. I don't see a lot of reason to ever do that, so it's in the "refactor if we ever discover we need it" pile.
I thought that we might want to listen on both IP versions for whatever reason, on a single socket, with a specific address (say, that v4-mapped address and the equivalent untranslated address...?).
I'm not really sure what you mean by an "equivalent untranslated address". AFAIK, the only non-wildcard case that will actually listen on both IP versions is a v4-mapped address.
I mean 192.0.2.1 (untranslated, IPv4) and ::ffff:192.0.2.1 (v4-mapped). Will we ever want to listen to both?
Maybe one day, but not soon, I think.
I don't think we have to care about that right now, though.
Agreed.
So, yes we probably should explicitly set IPV6_V6ONLY==0 for v4-mapped addresses as well.
I know it can't be done now anyway, I'm just saying that sock_l4_dualstack() forcing wildcard addresses isn't something we should imply as part of "dualstack".
Hm, ok. What if I renamed it to sock_l4_dualwild()?
Short-hands for "wildcard" aren't necessarily obvious. I would have gone with "dual_any" or "dualstack_any" or "v4v6_any" or "inany_any".
'dualstack_any' is definitely a better idea. Unfortunately, I forget to change this in my last spin.
But actually it can also stay like that, I guess, especially as this looks refactor-prone for https://bugs.passt.top/show_bug.cgi?id=140.
True.
I just wanted to raise the fact it's not obvious that "dualstack" implies :: and 0.0.0.0. It doesn't need to be addressed in code or comments now, or ever.
Ok. Sounds like it's definitely worth a respin. Perhaps I'll add a rename patch into some future series. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Thu, Nov 20, 2025 at 11:05:25AM +1100, David Gibson wrote:
On Wed, Nov 19, 2025 at 12:42:04PM +0100, Stefano Brivio wrote:
On Tue, 18 Nov 2025 14:34:58 +1100 David Gibson
wrote: On Tue, Nov 18, 2025 at 01:19:21AM +0100, Stefano Brivio wrote:
On Fri, 14 Nov 2025 10:21:46 +1100 David Gibson
wrote: [snip] I'm not really sure what you mean by an "equivalent untranslated address". AFAIK, the only non-wildcard case that will actually listen on both IP versions is a v4-mapped address. I mean 192.0.2.1 (untranslated, IPv4) and ::ffff:192.0.2.1 (v4-mapped). Will we ever want to listen to both?
Maybe one day, but not soon, I think.
I don't think we have to care about that right now, though.
Agreed.
So, I couldn't find anything definitive, but fwiw I did find a couple things suggesting that v4-mapped addresses probably shouldn't ever appear on the wire: https://datatracker.ietf.org/doc/html/draft-itojun-v6ops-v4mapped-harmful-02 https://lwn.net/Articles/688630/ -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Stefano Brivio