On Fri, 14 Nov 2025 11:24:31 +1100
David Gibson
On Thu, Nov 13, 2025 at 07:33:35AM +0100, Stefano Brivio wrote:
On Wed, 29 Oct 2025 17:26:26 +1100 David Gibson
wrote: Currently we only call setsockopt() on IPV6_V6ONLY when we want to set it to 1, which we typically do on all IPv6 sockets except those explicitly for dual stack listening. That's not quite right in two ways:
* Although IPV6_V6ONLY==0 is normally the default on Linux, that can be changed with the net.ipv6.bindv6only sysctl. It may also have different defaults on other OSes if we ever support them. We know we need it off for dual stack sockets, so explicitly set it to 0 in that case.
* At the same time setting IPV6_V6ONLY to 1 for IPv6 sockets bound to a specific address is harmless but pointless. Don't set the option at all in this case, saving a syscall.
I haven't checked the implications of this but __inet6_bind() handles address "types" IPV6_ADDR_ANY and IPV6_ADDR_MAPPED in the same way, for IPV6_V6ONLY purposes. I'm not sure if this has any influence on functionality though.
AFAICT, technically yes, but not in a way that really matters. IIUC for a v4-mapped address using IPV6_V6ONLY=0 will let the socket handle IPv4 traffic with the corresponding address. IPV6_V6ONLY will only handle IPv6 traffic that actually has the v4-mapped address on it.
We won't actually get to the point of passing v4-mapped addresses to the kernel in passt/pasta, because we already use those to mean "IPv4" and will go to the IPv4 paths.
Signed-off-by: David Gibson
--- util.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/util.c b/util.c index c94efae4..62f43895 100644 --- a/util.c +++ b/util.c @@ -45,14 +45,14 @@ * @type: epoll type * @sa: Socket address to bind to * @ifname: Interface for binding, NULL for any - * @v6only: Set IPV6_V6ONLY socket option + * @v6only: If >= 0, set IPV6_V6ONLY socket option to this value * @data: epoll reference portion for protocol handlers * * Return: newly created socket, negative error code on failure */ static int sock_l4_(const struct ctx *c, enum epoll_type type, const union sockaddr_inany *sa, const char *ifname, - bool v6only, uint32_t data) + int v6only, uint32_t data) { sa_family_t af = sa->sa_family; union epoll_ref ref = { .type = type, .data = data }; @@ -101,9 +101,11 @@ static int sock_l4_(const struct ctx *c, enum epoll_type type,
ref.fd = fd;
- if (v6only) - if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, &y, sizeof(y))) - debug("Failed to set IPV6_V6ONLY on socket %i", fd); + if (v6only >= 0) + if (setsockopt(fd, IPPROTO_IPV6, IPV6_V6ONLY, + &v6only, sizeof(v6only))) + debug("Failed to set IPV6_V6ONLY to %d on socket %i", + v6only, fd);
Nit: curly brackets (two pairs) for consistency.
Fixed.
if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &y, sizeof(y))) debug("Failed to set SO_REUSEADDR on socket %i", fd); @@ -186,7 +188,16 @@ int sock_l4(const struct ctx *c, enum epoll_type type, const union sockaddr_inany *sa, const char *ifname, uint32_t data) { - return sock_l4_(c, type, sa, ifname, sa->sa_family == AF_INET6, data); + int v6only = -1; + + /* The option doesn't exist for IPv4 sockets, and is irrelevant for IPv6 + * sockets with a non-wildcard address.
Same as above: I don't think this is true, strictly speaking, but I didn't check whether this inaccuracy is in any way relevant.
Right, so technically yes, but I don't think it's relevant for the reasons I gave above. I've rephrased to "we don't care about it for IPv6 sockets with ...". Does that help?
Yes, it does, thanks.
+ */ + if (sa->sa_family == AF_INET6 && + IN6_IS_ADDR_UNSPECIFIED(&sa->sa6.sin6_addr)) + v6only = 1; + + return sock_l4_(c, type, sa, ifname, v6only, data); }
int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, @@ -198,6 +209,10 @@ int sock_l4_dualstack(const struct ctx *c, enum epoll_type type, .sa6.sin6_port = htons(port), };
+ /* Dual stack sockets require IPV6_V6ONLY == 0. Usually that's the + * default, but sysctl net.ipv6.bindv6only can change that, so set the + * sockopt explicitly. + */ return sock_l4_(c, type, &sa, ifname, 0, data); }
The rest of the series looks good to me, including 8/8, but it's not clear to me what the "secondary reasons" to consider 8/8 at this stage might be, so I'm not actually sure what to do with it.
Is it because it drops some code?
* It drops some code * I think it will address bug 113 (still need to check)
Right, I forgot about this part.
* It might also help for some other systemd-resolved edge cases * It will make life a bit easier to implement bug 171 * I think the improved symmetry will make other flexible forwarding changes a bit easier
I'll look into bug 113 and revise the commit message for 8/8 in the next spin.
-- Stefano