[PATCH v3 0/7] Prevent DAD for link-local addresses in containers
There's no point in letting a container perform duplicate address detection as we'll silently discard neighbour solicitations with unspecified source addresses anyway, without relaying them to anybody. And we realised that it's not harmless, see the whole discussion around https://github.com/containers/podman/pull/23561#discussion_r1711639663: we can't communicate with the container right away because of that, which is surely annoying for tests, but it could also be an issue for use cases with very short-lived containers or namespaces. Disabling DAD via procfs configuration would be simpler than all this, but we don't own the namespace (unless we spawn a shell), so we shouldn't mess up with procfs entries, assuming it's even possible. Set the nodad attribute, and prevent DAD from being triggered before on link up, before we can set that attribute. v3: - in 4/7, actually handle all the netlink responses for the case where we change multiple addresses v2: - in 4/7, instead of doing the whole nl_routes_dup()-vendored dance to keep addresses in a single buffer, send NLM_F_REPLACE requests right away, but use nlmsg_send() instead of nl_do(), and check for answers to our further requests later. Use warn() instead of die() if we can't set nodad attributes - in 5/7, make nl_addr_get_ll() get a pointer to struct in6_addr instead of a generic void pointer, and warn(), don't die(), if it fails Stefano Brivio (7): netlink: Fix typo in function comment for nl_addr_get() netlink, pasta: Split MTU setting functionality out of nl_link_up() netlink, pasta: Turn nl_link_up() into a generic function to set link flags netlink, pasta: Disable DAD for link-local addresses on namespace interface netlink, pasta: Fetch link-local address from namespace interface once it's up pasta: Disable neighbour solicitations on device up to prevent DAD netlink: Fix typo in function comment for nl_addr_set() netlink.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++----- netlink.h | 6 ++- pasta.c | 29 ++++++++++- 3 files changed, 166 insertions(+), 15 deletions(-) -- 2.43.0
Signed-off-by: Stefano Brivio
As we'll use nl_link_up() for more than just bringing up devices, it
will become awkward to carry empty MTU values around whenever we call
it.
Signed-off-by: Stefano Brivio
In the next patches, we'll reuse it to set flags other than IFF_UP.
Signed-off-by: Stefano Brivio
It makes no sense for a container or a guest to try and perform
duplicate address detection for their link-local address, as we'll
anyway not relay neighbour solicitations with an unspecified source
address.
While they perform duplicate address detection, the link-local address
is not usable, which prevents us from bringing up especially
containers and communicate with them right away via IPv6.
This is not enough to prevent DAD and reach the container right away:
we'll need a couple more patches.
As we send NLM_F_REPLACE requests right away, while we still have to
read out other addresses on the same socket, we can't use nl_do():
keep track of the last sequence we sent (last address we changed), and
deal with the answers to those NLM_F_REPLACE requests in a separate
loop, later.
Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663
Signed-off-by: Stefano Brivio
On Fri, Aug 16, 2024 at 09:39:15AM +0200, Stefano Brivio wrote:
It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address.
While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6.
This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches.
As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep track of the last sequence we sent (last address we changed), and deal with the answers to those NLM_F_REPLACE requests in a separate loop, later.
Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio
--- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; }
+/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + }
Uh.. did you forget to push an update. This looks like the last version.
+ if (status < 0) + return status;
You still have this early return.
+ + seq += ll_addrs; + + nl_foreach(nh, status, s, buf, seq) + warn("netlink: Unexpected response message");
And you need an outer loop over this nl_foreach() for each value of seq from the one from the RTM_GETADDR to the last one from NLM_F_REPLACE.
+ + return status; +} + /** * nl_addr_get() - Get most specific global address, given interface and family * @s: Netlink socket diff --git a/netlink.h b/netlink.h index 178f8ae..66a44ad 100644 --- a/netlink.h +++ b/netlink.h @@ -19,6 +19,7 @@ int nl_addr_get(int s, unsigned int ifi, sa_family_t af, void *addr, int *prefix_len, void *addr_l); int nl_addr_set(int s, unsigned int ifi, sa_family_t af, const void *addr, int prefix_len); +int nl_addr_set_ll_nodad(int s, unsigned int ifi); int nl_addr_dup(int s_src, unsigned int ifi_src, int s_dst, unsigned int ifi_dst, sa_family_t af); int nl_link_get_mac(int s, unsigned int ifi, void *mac); diff --git a/pasta.c b/pasta.c index 96545b1..17eed15 100644 --- a/pasta.c +++ b/pasta.c @@ -340,6 +340,12 @@ void pasta_ns_conf(struct ctx *c) }
if (c->ifi6) { + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); + if (rc < 0) { + warn("Can't set nodad for LL in namespace: %s", + strerror(-rc)); + } + if (c->ip6.no_copy_addrs) { rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET6, &c->ip6.addr, 64);
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sat, 17 Aug 2024 17:59:45 +1000
David Gibson
On Fri, Aug 16, 2024 at 09:39:15AM +0200, Stefano Brivio wrote:
It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address.
While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6.
This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches.
As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep track of the last sequence we sent (last address we changed), and deal with the answers to those NLM_F_REPLACE requests in a separate loop, later.
Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio
--- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; }
+/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + }
Uh.. did you forget to push an update. This looks like the last version.
Worse :( I accidentally squashed the new changes into 5/7. Respinning... thanks for noticing. -- Stefano
On Sat, 17 Aug 2024 10:37:16 +0200
Stefano Brivio
On Sat, 17 Aug 2024 17:59:45 +1000 David Gibson
wrote: On Fri, Aug 16, 2024 at 09:39:15AM +0200, Stefano Brivio wrote:
It makes no sense for a container or a guest to try and perform duplicate address detection for their link-local address, as we'll anyway not relay neighbour solicitations with an unspecified source address.
While they perform duplicate address detection, the link-local address is not usable, which prevents us from bringing up especially containers and communicate with them right away via IPv6.
This is not enough to prevent DAD and reach the container right away: we'll need a couple more patches.
As we send NLM_F_REPLACE requests right away, while we still have to read out other addresses on the same socket, we can't use nl_do(): keep track of the last sequence we sent (last address we changed), and deal with the answers to those NLM_F_REPLACE requests in a separate loop, later.
Link: https://github.com/containers/podman/pull/23561#discussion_r1711639663 Signed-off-by: Stefano Brivio
--- netlink.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ netlink.h | 1 + pasta.c | 6 ++++++ 3 files changed, 62 insertions(+) diff --git a/netlink.c b/netlink.c index 873e6c7..59f2fd9 100644 --- a/netlink.c +++ b/netlink.c @@ -673,6 +673,61 @@ int nl_route_dup(int s_src, unsigned int ifi_src, return 0; }
+/** + * nl_addr_set_ll_nodad() - Set IFA_F_NODAD on IPv6 link-local addresses + * @s: Netlink socket + * @ifi: Interface index in target namespace + * + * Return: 0 on success, negative error code on failure + */ +int nl_addr_set_ll_nodad(int s, unsigned int ifi) +{ + struct req_t { + struct nlmsghdr nlh; + struct ifaddrmsg ifa; + } req = { + .ifa.ifa_family = AF_INET6, + .ifa.ifa_index = ifi, + }; + unsigned ll_addrs = 0; + struct nlmsghdr *nh; + char buf[NLBUFSIZ]; + ssize_t status; + uint32_t seq; + + seq = nl_send(s, &req, RTM_GETADDR, NLM_F_DUMP, sizeof(req)); + nl_foreach_oftype(nh, status, s, buf, seq, RTM_NEWADDR) { + struct ifaddrmsg *ifa = (struct ifaddrmsg *)NLMSG_DATA(nh); + struct rtattr *rta; + size_t na; + + if (ifa->ifa_index != ifi || ifa->ifa_scope != RT_SCOPE_LINK) + continue; + + ifa->ifa_flags |= IFA_F_NODAD; + + for (rta = IFA_RTA(ifa), na = IFA_PAYLOAD(nh); RTA_OK(rta, na); + rta = RTA_NEXT(rta, na)) { + /* If 32-bit flags are used, add IFA_F_NODAD there */ + if (rta->rta_type == IFA_FLAGS) + *(uint32_t *)RTA_DATA(rta) |= IFA_F_NODAD; + } + + nl_send(s, nh, RTM_NEWADDR, NLM_F_REPLACE, nh->nlmsg_len); + ll_addrs++; + }
Uh.. did you forget to push an update. This looks like the last version.
Worse :( I accidentally squashed the new changes into 5/7.
6/7, actually. -- Stefano
As soon as we bring up the interface, the Linux kernel will set up a
link-local address for it, so we can fetch it and start using right
away, if we need a link-local address to communicate to the container
before we see any traffic coming from it.
Signed-off-by: Stefano Brivio
As soon as we the kernel notifier for IPv6 address configuration
(addrconf_notify()) sees that we bring the target interface up
(NETDEV_UP), it will schedule duplicate address detection, so, by
itself, setting the nodad flag later is useless, because that won't
stop a detection that's already in progress.
However, if we disable neighbour solicitations with IFF_NOARP (which
is a misnomer for IPv6 interfaces, but there's no possibility of
mixing things up), the notifier will not trigger DAD, because it can't
be done, of course, without neighbour solicitations.
Set IFF_NOARP as we bring up the device, and drop it after we had a
chance to set the nodad attribute on the link.
Signed-off-by: Stefano Brivio
Signed-off-by: Stefano Brivio
participants (2)
-
David Gibson
-
Stefano Brivio