[PATCH 0/8] Small cleanups related to addresses and binding
Here's another batch of cleanups for minor warts I noticed while working on flow table things. David Gibson (8): tcp: Fix address type for tcp_sock_init_af() treewide: Use IN4ADDR_LOOPBACK_INIT more widely treewide: Add IN4ADDR_ANY_INIT macro util: Use htonl_constant() in more places util: Improve sockaddr initialisation in sock_l4() icmp: Avoid unnecessary handling of unspecified bind address treewide: Avoid in_addr_t util: Make sock_l4() treat empty string ifname like NULL icmp.c | 27 ++++++--------------------- tcp.c | 4 ++-- tcp_splice.c | 2 +- udp.c | 14 ++++++-------- util.c | 12 ++++-------- util.h | 7 +++++-- 6 files changed, 24 insertions(+), 42 deletions(-) -- 2.43.0
This takes a struct in_addr * (i.e. an IPv4 address), although it's
explicitly supposed to handle IPv6 as well. Both its caller and sock_l4()
which it calls use a void * for the address, which can be either an in_addr
or an in6_addr.
We get away with this, because we don't do anything with the pointer other
than transfer it from the caller to sock_l4(), but it's misleading. And
quite possibly technically UB, because C is like that.
Signed-off-by: David Gibson
On Fri, 8 Dec 2023 01:31:33 +1100
David Gibson
This takes a struct in_addr * (i.e. an IPv4 address), although it's explicitly supposed to handle IPv6 as well. Both its caller and sock_l4() which it calls use a void * for the address, which can be either an in_addr or an in6_addr.
We get away with this, because we don't do anything with the pointer other than transfer it from the caller to sock_l4(), but it's misleading. And quite possibly technically UB, because C is like that.
Signed-off-by: David Gibson
--- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index f506cfd..bda95b2 100644 --- a/tcp.c +++ b/tcp.c @@ -2905,7 +2905,7 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events) * Return: fd for the new listening socket, negative error code on failure */ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port, - const struct in_addr *addr, const char *ifname) + const void *addr, const char *ifname)
This is obviously correct. However, after a lot of thinking: (gcc) optimisations based on Type-Based Alias Analysis, which we don't disable on this path, could, now, happily defer filling 'addr' with inet_pton() in conf_ports() to a point *after* the tcp_sock_init() call. Without this patch, at least 32 bits must be updated before the call. It might sound like a joke because... it actually is. But look at what we had to do for the functions in checksum.c. We pass const void *buf, and anything that buf points to can be updated (with TBAA) after the call. I don't see any conceptual difference between this case and those functions. Anyway, that won't reasonably happen here, and in any case this would have been broken for IPv6, so I'll go ahead and apply this. But, eventually, I think we should switch all these usages to union inany_addr *. -- Stefano
On Wed, Dec 27, 2023 at 09:25:06PM +0100, Stefano Brivio wrote:
On Fri, 8 Dec 2023 01:31:33 +1100 David Gibson
wrote: This takes a struct in_addr * (i.e. an IPv4 address), although it's explicitly supposed to handle IPv6 as well. Both its caller and sock_l4() which it calls use a void * for the address, which can be either an in_addr or an in6_addr.
We get away with this, because we don't do anything with the pointer other than transfer it from the caller to sock_l4(), but it's misleading. And quite possibly technically UB, because C is like that.
Signed-off-by: David Gibson
--- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index f506cfd..bda95b2 100644 --- a/tcp.c +++ b/tcp.c @@ -2905,7 +2905,7 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events) * Return: fd for the new listening socket, negative error code on failure */ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port, - const struct in_addr *addr, const char *ifname) + const void *addr, const char *ifname)
This is obviously correct.
However, after a lot of thinking: (gcc) optimisations based on Type-Based Alias Analysis, which we don't disable on this path, could, now, happily defer filling 'addr' with inet_pton() in conf_ports() to a point *after* the tcp_sock_init() call.
Hrm... possibly. The fact that the addr variable in conf_ports() is a char array, not a struct in*_addr might save us. I think replacing it with a union of an in_addr and in6_addr would also be ok.
Without this patch, at least 32 bits must be updated before the call.
I'm not sure that's correct. If the compiler is allowed to assume that a char[] and a void * aren't aliased (even if they clearly are), then I'd expect it to also be allowed to assume that a char[] and a struct in_addr * aren't aliased.
It might sound like a joke because... it actually is. But look at what we had to do for the functions in checksum.c. We pass const void *buf, and anything that buf points to can be updated (with TBAA) after the call.
I don't see any conceptual difference between this case and those functions.
Anyway, that won't reasonably happen here, and in any case this would have been broken for IPv6, so I'll go ahead and apply this.
But, eventually, I think we should switch all these usages to union inany_addr *.
So, we may be able to use union inany_addr in some places, but that's not the same thing as this: inany_addr carries IPv4 addresses as mapped IPv6 addresses, it's not switched on a separate af parameter. We could, of course, define a new type as a simple union of in_addr and in6_addr. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Thu, 28 Dec 2023 13:42:25 +1100
David Gibson
On Wed, Dec 27, 2023 at 09:25:06PM +0100, Stefano Brivio wrote:
On Fri, 8 Dec 2023 01:31:33 +1100 David Gibson
wrote: This takes a struct in_addr * (i.e. an IPv4 address), although it's explicitly supposed to handle IPv6 as well. Both its caller and sock_l4() which it calls use a void * for the address, which can be either an in_addr or an in6_addr.
We get away with this, because we don't do anything with the pointer other than transfer it from the caller to sock_l4(), but it's misleading. And quite possibly technically UB, because C is like that.
Signed-off-by: David Gibson
--- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index f506cfd..bda95b2 100644 --- a/tcp.c +++ b/tcp.c @@ -2905,7 +2905,7 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events) * Return: fd for the new listening socket, negative error code on failure */ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port, - const struct in_addr *addr, const char *ifname) + const void *addr, const char *ifname)
This is obviously correct.
However, after a lot of thinking: (gcc) optimisations based on Type-Based Alias Analysis, which we don't disable on this path, could, now, happily defer filling 'addr' with inet_pton() in conf_ports() to a point *after* the tcp_sock_init() call.
Hrm... possibly. The fact that the addr variable in conf_ports() is a char array, not a struct in*_addr might save us.
Hmm, look at the commit message for a48c5c2abf8a ("treewide: Disable gcc strict aliasing rules as needed, drop workarounds"): that didn't help with the checksum functions, because yes, at some point I had char *, but then I used those as different types. I guess struct in_addr / struct in6_addr as we have in sock_l4() might be equivalent to that.
I think replacing it with a union of an in_addr and in6_addr would also be ok.
That should work, yes, and that's what I originally wanted to suggest, before remembering about union inany_addr... but that doesn't fit, see below.
Without this patch, at least 32 bits must be updated before the call.
I'm not sure that's correct. If the compiler is allowed to assume that a char[] and a void * aren't aliased (even if they clearly are), then I'd expect it to also be allowed to assume that a char[] and a struct in_addr * aren't aliased.
Ouch, right, they aren't (again... sarcastically speaking).
It might sound like a joke because... it actually is. But look at what we had to do for the functions in checksum.c. We pass const void *buf, and anything that buf points to can be updated (with TBAA) after the call.
I don't see any conceptual difference between this case and those functions.
Anyway, that won't reasonably happen here, and in any case this would have been broken for IPv6, so I'll go ahead and apply this.
But, eventually, I think we should switch all these usages to union inany_addr *.
So, we may be able to use union inany_addr in some places, but that's not the same thing as this: inany_addr carries IPv4 addresses as mapped IPv6 addresses, it's not switched on a separate af parameter.
I really meant *a pointer* to union inany_addr, that is:
We could, of course, define a new type as a simple union of in_addr and in6_addr.
...abusing it instead of using a separate union. On the other hand, given where 'a4' is in there, it's not necessarily the same for (strict) aliasing considerations. Is "union in10_addr" fashionable enough? We could use A [16], but it's inconvenient to type, and difficult to pronounce. -- Stefano
On Thu, Dec 28, 2023 at 11:11:19AM +0100, Stefano Brivio wrote:
On Thu, 28 Dec 2023 13:42:25 +1100 David Gibson
wrote: On Wed, Dec 27, 2023 at 09:25:06PM +0100, Stefano Brivio wrote:
On Fri, 8 Dec 2023 01:31:33 +1100 David Gibson
wrote: This takes a struct in_addr * (i.e. an IPv4 address), although it's explicitly supposed to handle IPv6 as well. Both its caller and sock_l4() which it calls use a void * for the address, which can be either an in_addr or an in6_addr.
We get away with this, because we don't do anything with the pointer other than transfer it from the caller to sock_l4(), but it's misleading. And quite possibly technically UB, because C is like that.
Signed-off-by: David Gibson
--- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index f506cfd..bda95b2 100644 --- a/tcp.c +++ b/tcp.c @@ -2905,7 +2905,7 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events) * Return: fd for the new listening socket, negative error code on failure */ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port, - const struct in_addr *addr, const char *ifname) + const void *addr, const char *ifname)
This is obviously correct.
However, after a lot of thinking: (gcc) optimisations based on Type-Based Alias Analysis, which we don't disable on this path, could, now, happily defer filling 'addr' with inet_pton() in conf_ports() to a point *after* the tcp_sock_init() call.
Hrm... possibly. The fact that the addr variable in conf_ports() is a char array, not a struct in*_addr might save us.
Hmm, look at the commit message for a48c5c2abf8a ("treewide: Disable gcc strict aliasing rules as needed, drop workarounds"): that didn't help with the checksum functions, because yes, at some point I had char *, but then I used those as different types.
I guess struct in_addr / struct in6_addr as we have in sock_l4() might be equivalent to that.
I think replacing it with a union of an in_addr and in6_addr would also be ok.
That should work, yes, and that's what I originally wanted to suggest, before remembering about union inany_addr... but that doesn't fit, see below.
Without this patch, at least 32 bits must be updated before the call.
I'm not sure that's correct. If the compiler is allowed to assume that a char[] and a void * aren't aliased (even if they clearly are), then I'd expect it to also be allowed to assume that a char[] and a struct in_addr * aren't aliased.
Ouch, right, they aren't (again... sarcastically speaking).
It might sound like a joke because... it actually is. But look at what we had to do for the functions in checksum.c. We pass const void *buf, and anything that buf points to can be updated (with TBAA) after the call.
I don't see any conceptual difference between this case and those functions.
Anyway, that won't reasonably happen here, and in any case this would have been broken for IPv6, so I'll go ahead and apply this.
But, eventually, I think we should switch all these usages to union inany_addr *.
So, we may be able to use union inany_addr in some places, but that's not the same thing as this: inany_addr carries IPv4 addresses as mapped IPv6 addresses, it's not switched on a separate af parameter.
I really meant *a pointer* to union inany_addr, that is:
We could, of course, define a new type as a simple union of in_addr and in6_addr.
...abusing it instead of using a separate union. On the other hand, given where 'a4' is in there, it's not necessarily the same for (strict) aliasing considerations.
Is "union in10_addr" fashionable enough? We could use A [16], but it's inconvenient to type, and difficult to pronounce.
Uh, I haven't heard of either of those. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Sun, 7 Jan 2024 16:35:56 +1100
David Gibson
On Thu, Dec 28, 2023 at 11:11:19AM +0100, Stefano Brivio wrote:
On Thu, 28 Dec 2023 13:42:25 +1100 David Gibson
wrote: On Wed, Dec 27, 2023 at 09:25:06PM +0100, Stefano Brivio wrote:
On Fri, 8 Dec 2023 01:31:33 +1100 David Gibson
wrote: This takes a struct in_addr * (i.e. an IPv4 address), although it's explicitly supposed to handle IPv6 as well. Both its caller and sock_l4() which it calls use a void * for the address, which can be either an in_addr or an in6_addr.
We get away with this, because we don't do anything with the pointer other than transfer it from the caller to sock_l4(), but it's misleading. And quite possibly technically UB, because C is like that.
Signed-off-by: David Gibson
--- tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcp.c b/tcp.c index f506cfd..bda95b2 100644 --- a/tcp.c +++ b/tcp.c @@ -2905,7 +2905,7 @@ void tcp_sock_handler(struct ctx *c, union epoll_ref ref, uint32_t events) * Return: fd for the new listening socket, negative error code on failure */ static int tcp_sock_init_af(const struct ctx *c, int af, in_port_t port, - const struct in_addr *addr, const char *ifname) + const void *addr, const char *ifname)
This is obviously correct.
However, after a lot of thinking: (gcc) optimisations based on Type-Based Alias Analysis, which we don't disable on this path, could, now, happily defer filling 'addr' with inet_pton() in conf_ports() to a point *after* the tcp_sock_init() call.
Hrm... possibly. The fact that the addr variable in conf_ports() is a char array, not a struct in*_addr might save us.
Hmm, look at the commit message for a48c5c2abf8a ("treewide: Disable gcc strict aliasing rules as needed, drop workarounds"): that didn't help with the checksum functions, because yes, at some point I had char *, but then I used those as different types.
I guess struct in_addr / struct in6_addr as we have in sock_l4() might be equivalent to that.
I think replacing it with a union of an in_addr and in6_addr would also be ok.
That should work, yes, and that's what I originally wanted to suggest, before remembering about union inany_addr... but that doesn't fit, see below.
Without this patch, at least 32 bits must be updated before the call.
I'm not sure that's correct. If the compiler is allowed to assume that a char[] and a void * aren't aliased (even if they clearly are), then I'd expect it to also be allowed to assume that a char[] and a struct in_addr * aren't aliased.
Ouch, right, they aren't (again... sarcastically speaking).
It might sound like a joke because... it actually is. But look at what we had to do for the functions in checksum.c. We pass const void *buf, and anything that buf points to can be updated (with TBAA) after the call.
I don't see any conceptual difference between this case and those functions.
Anyway, that won't reasonably happen here, and in any case this would have been broken for IPv6, so I'll go ahead and apply this.
But, eventually, I think we should switch all these usages to union inany_addr *.
So, we may be able to use union inany_addr in some places, but that's not the same thing as this: inany_addr carries IPv4 addresses as mapped IPv6 addresses, it's not switched on a separate af parameter.
I really meant *a pointer* to union inany_addr, that is:
We could, of course, define a new type as a simple union of in_addr and in6_addr.
...abusing it instead of using a separate union. On the other hand, given where 'a4' is in there, it's not necessarily the same for (strict) aliasing considerations.
Is "union in10_addr" fashionable enough? We could use A [16], but it's inconvenient to type, and difficult to pronounce.
Uh, I haven't heard of either of those.
Right, I just made that up. I was suggesting a new name for this hypothetical union. -- Stefano
We already define IN4ADDR_LOOPBACK_INIT to initialise a struct in_addr to
the loopback address without delving into its internals. However there are
some places we don't use it, and explicitly look at the internal structure
of struct in_addr, which we generally want to avoid. Use the define more
widely to avoid that.
Signed-off-by: David Gibson
We already define IN4ADDR_LOOPBACK_INIT to initialise a struct in_addr to
the loopback address, make a similar one for the unspecified / any address.
This avoids messying things with the internal structure of struct in_addr
where we don't care about it.
Signed-off-by: David Gibson
We might as well when we're passing a known constant value, giving the
compiler the best chance to optimise things away.
Signed-off-by: David Gibson
Currently we initialise the address field of the sockaddrs we construct
to the any/unspecified address, but not in a very clear way: we use
explicit 0 values, which is only interpretable if you know the order of
fields in the sockaddr structures. Use explicit field names, and explicit
initialiser macros for the address.
Because we initialise to this default value, we don't need to explicitly
set the any/unspecified address later on if the caller didn't pass an
overriding bind address.
Signed-off-by: David Gibson
On Fri, 8 Dec 2023 01:31:37 +1100
David Gibson
Currently we initialise the address field of the sockaddrs we construct to the any/unspecified address, but not in a very clear way: we use explicit 0 values, which is only interpretable if you know the order of fields in the sockaddr structures. Use explicit field names, and explicit initialiser macros for the address.
Because we initialise to this default value, we don't need to explicitly set the any/unspecified address later on if the caller didn't pass an overriding bind address.
Signed-off-by: David Gibson
--- util.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/util.c b/util.c index d465e48..0152ae6 100644 --- a/util.c +++ b/util.c @@ -105,12 +105,12 @@ int sock_l4(const struct ctx *c, int af, uint8_t proto, struct sockaddr_in addr4 = { .sin_family = AF_INET, .sin_port = htons(port), - { 0 }, { 0 }, + .sin_addr = IN4ADDR_ANY_INIT,
The second { 0 } was meant to initialise .sin_zero, and:
}; struct sockaddr_in6 addr6 = { .sin6_family = AF_INET6, .sin6_port = htons(port), - 0, IN6ADDR_ANY_INIT, 0, + .sin6_addr = IN6ADDR_ANY_INIT,
these zeroes were for sin6_flowinfo and sin6_scope_id. Not needed because we want zeroes anyway (or the "same as objects that have static storage duration"), but see commit eed6933e6c29 ("udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}"): gcc versions 7 to 9 used to complain. And gcc 9 isn't *that* old. But then, you might ask, why didn't I just use names for all the initialisers? Well, there was some issue with sockaddr_in6 or sockaddr_in not having a field defined in some header (kernel or a C library). I have a vague memory it was about sin6_scope_id, but I can't find any note in the git history or anywhere else. :( Now, the latest addition to sockaddr_in6 was sin6_scope_id (kernel: 2006, glibc: 2000), so that's fine. Also sockaddr_in looks okay, including sin_zero, with "#define sin_zero __pad" in the kernel version. So... my preferred option would be to leave this untouched: the initialisers you replaced are anyway all zeroes. And, after RFC 2553 (24 years ago), the order of those fields never changed. If it really bothers you, let's at least initialise everything explicitly by name, because we know that gcc 9 complains, and if we hit another version of sockaddr_in6 without a named sin6_scope_id, we'll find out, eventually. The rest of the series looks good to me and I just applied it (minus _this part_ of this patch). -- Stefano
On Wed, Dec 27, 2023 at 09:25:15PM +0100, Stefano Brivio wrote:
On Fri, 8 Dec 2023 01:31:37 +1100 David Gibson
wrote: Currently we initialise the address field of the sockaddrs we construct to the any/unspecified address, but not in a very clear way: we use explicit 0 values, which is only interpretable if you know the order of fields in the sockaddr structures. Use explicit field names, and explicit initialiser macros for the address.
Because we initialise to this default value, we don't need to explicitly set the any/unspecified address later on if the caller didn't pass an overriding bind address.
Signed-off-by: David Gibson
--- util.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/util.c b/util.c index d465e48..0152ae6 100644 --- a/util.c +++ b/util.c @@ -105,12 +105,12 @@ int sock_l4(const struct ctx *c, int af, uint8_t proto, struct sockaddr_in addr4 = { .sin_family = AF_INET, .sin_port = htons(port), - { 0 }, { 0 }, + .sin_addr = IN4ADDR_ANY_INIT,
The second { 0 } was meant to initialise .sin_zero, and:
}; struct sockaddr_in6 addr6 = { .sin6_family = AF_INET6, .sin6_port = htons(port), - 0, IN6ADDR_ANY_INIT, 0, + .sin6_addr = IN6ADDR_ANY_INIT,
these zeroes were for sin6_flowinfo and sin6_scope_id.
Not needed because we want zeroes anyway (or the "same as objects that have static storage duration"), but see commit eed6933e6c29 ("udp: Explicitly initialise sin6_scope_id and sin_zero in sockaddr_in{,6}"): gcc versions 7 to 9 used to complain. And gcc 9 isn't *that* old.
But then, you might ask, why didn't I just use names for all the initialisers? Well, there was some issue with sockaddr_in6 or sockaddr_in not having a field defined in some header (kernel or a C library). I have a vague memory it was about sin6_scope_id, but I can't find any note in the git history or anywhere else. :(
Now, the latest addition to sockaddr_in6 was sin6_scope_id (kernel: 2006, glibc: 2000), so that's fine. Also sockaddr_in looks okay, including sin_zero, with "#define sin_zero __pad" in the kernel version.
So... my preferred option would be to leave this untouched: the initialisers you replaced are anyway all zeroes. And, after RFC 2553 (24 years ago), the order of those fields never changed.
If it really bothers you, let's at least initialise everything explicitly by name, because we know that gcc 9 complains, and if we hit another version of sockaddr_in6 without a named sin6_scope_id, we'll find out, eventually.
The rest of the series looks good to me and I just applied it (minus _this part_ of this patch).
Eh, ok. I'll worry about this bit if I run across it again. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
We go to some trouble, if the configured output address is unspecified, to
pass NULL to sock_l4(). But while passing NULL is one way to get sock_l4()
not to specify a bind address, passing the "any" address explicitly works
too. Use this to simplify icmp_tap_handler().
Signed-off-by: David Gibson
IPv4 addresses can be stored in an in_addr_t or a struct in_addr. The
former is just a type alias to a 32-bit integer, so doesn't really give us
any type checking. Therefore we generally prefer the structure, since we
mostly want to treat IP address as opaque objects. Fix a few places where
we still use in_addr_t, but can just as easily use struct in_addr.
Note there are still some uses of in_addr_t in conf.c, but those are
justified: since they're doing prefix calculations, they actually need to
look at the internals of the address as an integer.
Signed-off-by: David Gibson
sock_l4() takes NULL for ifname if you don't want to bind the socket to a
particular interface. However, for a number of the callers, it's more
natural to use an empty string for that case. Change sock_l4() to accept
either NULL or an empty string equivalently, and simplify some callers
using that change.
Signed-off-by: David Gibson
On Fri, 8 Dec 2023 01:31:32 +1100
David Gibson
Here's another batch of cleanups for minor warts I noticed while working on flow table things.
David Gibson (8): tcp: Fix address type for tcp_sock_init_af() treewide: Use IN4ADDR_LOOPBACK_INIT more widely treewide: Add IN4ADDR_ANY_INIT macro util: Use htonl_constant() in more places util: Improve sockaddr initialisation in sock_l4() icmp: Avoid unnecessary handling of unspecified bind address treewide: Avoid in_addr_t util: Make sock_l4() treat empty string ifname like NULL
Applied, minus first two hunks of 5/8. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio