[PATCH v7 00/13] Introduce multiple addresses and late binding
This series adds handling of multiple addresses into a unified address array, so that a guest can see the same addresses on his own interface. o All addresses are stored as union inany_addr o User configured addresses are marked with a USER flag. o Host provided addresses are marked with a HOST flag. o Link local addresses are also marked with a LINKLOCAL flag. o Addresses the guest is actually using are marked with an OBSERVED flag. o Addresses eligible for DHCP assignments are marked with an DHCP flag. o Addresses eligible for DHCPv6 advertisement are marked with an DHCPV6 flag. o Addresses eligible for NDP advertisement are marked with an NDP flag. v2: - Added the earlier standalone CIDR commit to the head of the series. - Replaced the guest namespace interface subscriptions with just an address observation feature, so that it works with both PASTA and PASST. - Unified 'no_copy_addrs' and 'copy_addrs' code paths, as suggested by David G. - Multiple other changes, also based on feedback from David. - Removed the host interface subscription patches, -for now. I intend to re-add them once this series is applied. - Outstanding question: When do we add an IPv4 link local address to the guest? Only in local/opaque mode? Only when explicitly requested? Always? v3: - Unified the IPv4 and IPv6 arrays into one array - Changed prefix_len to always be in IPv6/IpV4 mapped format - Updated migration protocol to v3, handling multiple addresses - Many other smaller changes, based on feedback from the PASST team v4: - Numerous changes based on feedback - Added several new commits, mostly broken out of the pre-existing ones. v5: - Re-introduced multiple OBSERVED addresses. This actually turned out to be cleaner and with more predictable behaviour than allowing only one. - Included the DHCP and NDP patches from previous versions, improved and updated according to feedback from the team. - Likewise re-included the host-side netlink commit to support late binding. v6: - Skipped late binding commit for now. - Added commit for using a single print buffer in conf_print - Added commit for reading and adding all addresses from template interface. - Added commit for refactoring pasta_ns_conf(). - Added separate address flags for DHCP, DHCPv6, and NDP, so that those are easy to recognize for their respective functions. - Split DHCP and DHCPv6 address selection into separate commits. - Updated migration protocol to v3 for multi-address support. - Numerous other smaller changes, both after feedback from David G. and issues I have identified myself. v7: - Replaced commit #1 with one that fixes a return address issue with DHCPv6 - Modified for_each_addr() macro to take 4 arguments - Many more fixes and changes based on feedback and own findings. Jon Maloy (13): dhcpv6: Fix reply destination to match client's source address passt, pasta: Introduce unified multi-address data structures fwd: Unify guest accessibility checks with unified address array arp: Check all configured addresses in ARP filtering conf: Allow multiple -a/--address options per address family netlink, conf: Read all addresses from template interface at startup netlink, pasta: refactor function pasta_ns_conf() conf, pasta: Track observed guest IPv4 addresses in unified address array conf, pasta: Track observed guest IPv6 addresses in unified address array migrate: Update protocol to v3 for multi-address support dhcp: Select address for DHCP distribution dhcpv6: Select addresses for DHCPv6 distribution ndp: Support advertising multiple prefixes in Router Advertisements arp.c | 20 +++- conf.c | 200 ++++++++++++++++++++--------------- dhcp.c | 22 ++-- dhcpv6.c | 115 +++++++++++--------- dhcpv6.h | 2 +- fwd.c | 305 ++++++++++++++++++++++++++++++++++++++++-------------- fwd.h | 8 ++ inany.h | 44 ++++++++ ip.h | 2 + migrate.c | 240 ++++++++++++++++++++++++++++++++++++++++-- ndp.c | 131 ++++++++++++++++------- netlink.c | 70 +++++++------ netlink.h | 7 +- passt.1 | 7 +- passt.h | 78 +++++++++++--- pasta.c | 224 ++++++++++++++++++++------------------- tap.c | 37 ++----- tap.h | 2 - 18 files changed, 1054 insertions(+), 460 deletions(-) -- 2.52.0
tap_ip6_daddr() selects the reply destination based on our source
address type (link-local), so it always returns addr_ll_seen. But if
the client sent from a global address, we would reply to an address
different from what the client is expecting. Since RFC 8415 allows
clients to use global addresses for DHCPv6, we now correct this, and
always respond to the address the client was using.
We also remove a redundant addr_ll_seen assignment, since this is
already done by tap.c when processing IPv6 packets.
Signed-off-by: Jon Maloy
We replace the fwd_guest_accessible4() and fwd_guest_accessible6()
functions with a unified fwd_guest_accessible() function that handles
both address families. With the unified address array, we can check
all configured addresses in a single pass using for_each_addr() with
family filter AF_UNSPEC.
Signed-off-by: Jon Maloy
As a preparation for handling multiple addresses, we update ignore_arp()
to check against all addresses in the unified addrs[] array using the
for_each_addr() macro.
Signed-off-by: Jon Maloy
Add nl_addr_get_all() to read all addresses from the template interface
into c->addrs[] array, rather than just selecting the "best" one.
This allows multi-address configurations where the template interface
has multiple IPv4 or IPv6 addresses assigned to it, all of which
will now be copied to the guest namespace when using --config-net.
For IPv6, the function also captures the link-local address into
c->ip6.our_tap_ll as a side effect.
Update conf_ip4() and conf_ip6() to use nl_addr_get_all() when no
user-specified addresses are present.
Signed-off-by: Jon Maloy
Allow specifying multiple addresses per family with -a/--address.
The first address of each family is used for DHCP/DHCPv6 assignment.
Signed-off-by: Jon Maloy
We remove the addr_seen field in struct ip4_ctx and replace it by
setting a new CONF_ADDR_OBSERVED flag in the corresponding entry
in the unified address array.
The observed IPv4 address is always added at or moved to position 0,
increasing chances for a fast lookup.
Signed-off-by: Jon Maloy
After the previous changes in this series it becomes possible
to simplify the pasta_ns_conf() function.
We extract address and route configuration into helper functions
pasta_conf_addrs() and pasta_conf_routes(), reducing nesting
and improving readability.
To allow pasta_conf_addrs() to handle both address families
uniformly, we change nl_addr_set() to take a union inany_addr pointer
instead of void pointer, moving the address family handling into
the function itself.
We also fix a bug where the IPv6 code path incorrectly wrote to
req.set.a4.rta_l.rta_type instead of req.set.a6.rta_l.rta_type.
Signed-off-by: Jon Maloy
We remove the addr_seen and addr_ll_seen fields in struct ip6_ctx
and replace them by setting CONF_ADDR_OBSERVED and CONF_ADDR_LINKLOCAL
flags in the corresponding entry in the unified address array.
The observed IPv6 address is always added/moved to position 0
in the array, improving chances for fast lookup.
The separate check against addr_seen in fwd_guest_accessible() can now
be removed because the observed address is now in the unified array,
and the existing for_each_addr() loop already checks against all
addresses, including this one.
This completes the unification of address storage for both IPv4 and
IPv6, enabling future support for multiple guest addresses per family.
Signed-off-by: Jon Maloy
We introduce a CONF_ADDR_DHCP flag to mark if an added address is
eligible for DHCP advertisement. By doing this once and for all
in the fwd_set_addr() function, the DHCP code only needs to check
for this flag to know that all criteria for advertisement are
fulfilled. Hence, we update the code in dhcp.c correspondingly.
We also let the conf_print() function use this flag to determine
and print the selected address.
Signed-off-by: Jon Maloy
We update the migration protocol to version 3 to support distributing
multiple addresses from the unified address array. The new protocol
migrates all address entries in the array, along with their prefix
lengths and flags, and leaves it to the receiver to filter which
ones he wants to apply.
Signed-off-by: Jon Maloy
We introduce a CONF_ADDR_DHCPV6 flag to mark if an added address is
eligible for DHCP advertisement. By doing this once and for all
in the fwd_set_addr() function, the DHCPv6 code only needs to check
for this flag to know that all criteria for advertisement are fulfilled.
We update the code in dhcpv6.c both to use the new flag and to make
it possible to send multiple addresses in a single reply message,
per RFC 8415.
We also let the conf_print() function use this flag to identify and
print the eligible addresses.
Signed-off-by: Jon Maloy
We extend NDP to advertise all suitable IPv6 prefixes in Router
Advertisements, per RFC 4861. Observed and link-local addresses,
plus addresses with a prefix length != 64, are excluded.
Signed-off-by: Jon Maloy
On Sun, Apr 12, 2026 at 08:53:07PM -0400, Jon Maloy wrote:
tap_ip6_daddr() selects the reply destination based on our source address type (link-local), so it always returns addr_ll_seen.
I think there might have been more callers of tap_ip6_daddr() in the past, which might have made this not true.
But if the client sent from a global address, we would reply to an address different from what the client is expecting. Since RFC 8415 allows clients to use global addresses for DHCPv6, we now correct this, and always respond to the address the client was using.
Responding to the same address the client used is a good idea in general. However, for this specific case, I don't think it will quite do what we want. The problem is that we're still always using our_tap_ll (link local) as the source address. So if the client used a global address we'll send a packet with mismatched address scopes. AFAIU that won't usually work.
We also remove a redundant addr_ll_seen assignment, since this is already done by tap.c when processing IPv6 packets.
Signed-off-by: Jon Maloy
--- dhcpv6.c | 14 ++++++-------- dhcpv6.h | 2 +- tap.c | 15 --------------- tap.h | 2 -- 4 files changed, 7 insertions(+), 26 deletions(-) diff --git a/dhcpv6.c b/dhcpv6.c index 97c04e2..2db0944 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -370,12 +370,14 @@ notonlink: /** * dhcpv6_send_ia_notonlink() - Send NotOnLink status * @c: Execution context + * @saddr: Source address of client message (reply destination)
@saddr is a bad name in this context, since it's the source address of an earlier packet, not the one this function sends. Maybe @caddr or @client_addr?
* @ia_base: Non-appropriate IA_NA or IA_TA base * @client_id_base: Client ID message option base * @len: Client ID length * @xid: Transaction ID for message exchange */ static void dhcpv6_send_ia_notonlink(struct ctx *c, + const struct in6_addr *saddr, const struct iov_tail *ia_base, const struct iov_tail *client_id_base, int len, uint32_t xid) @@ -405,8 +407,7 @@ static void dhcpv6_send_ia_notonlink(struct ctx *c,
resp_not_on_link.hdr.xid = xid;
- tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, - xid, &resp_not_on_link, n); + tap_udp6_send(c, src, 547, saddr, 546, xid, &resp_not_on_link, n); }
/** @@ -543,7 +544,7 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data, * dhcpv6() - Check if this is a DHCPv6 message, reply as needed * @c: Execution context * @data: Single packet starting from UDP header - * @saddr: Source IPv6 address of original message + * @saddr: Source IPv6 address of original message (for reply destination)
I don't know that the addition really adds anything useful.
* @daddr: Destination IPv6 address of original message * * Return: 0 if it's not a DHCPv6 message, 1 if handled, -1 on failure @@ -590,8 +591,6 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (mlen + sizeof(*uh) != ntohs(uh->len) || mlen < sizeof(*mh)) return -1;
- c->ip6.addr_ll_seen = *saddr; - src = &c->ip6.our_tap_ll;
If the client is using a global address, I think we need to too. Which is a bit of a problem, since we don't really have any way to allocate one. Have you seen a guest using a global address for DHCPv6 in practice, or is it just a theoretical possibility? I am wondering if that's only something that would happen if we advertised a global address for the DHCPv6 server via NDP (which we don't, and can't).
mh = IOV_REMOVE_HEADER(data, mh_storage); @@ -630,7 +629,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,
if (dhcpv6_ia_notonlink(data, &c->ip6.addr)) {
- dhcpv6_send_ia_notonlink(c, data, &client_id_base, + dhcpv6_send_ia_notonlink(c, saddr, data, &client_id_base, ntohs(client_id->l), mh->xid);
return 1; @@ -680,8 +679,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data,
resp.hdr.xid = mh->xid;
- tap_udp6_send(c, src, 547, tap_ip6_daddr(c, src), 546, - mh->xid, &resp, n); + tap_udp6_send(c, src, 547, saddr, 546, mh->xid, &resp, n); c->ip6.addr_seen = c->ip6.addr;
return 1; diff --git a/dhcpv6.h b/dhcpv6.h index c706dfd..1015a1a 100644 --- a/dhcpv6.h +++ b/dhcpv6.h @@ -7,7 +7,7 @@ #define DHCPV6_H
int dhcpv6(struct ctx *c, struct iov_tail *data, - struct in6_addr *saddr, struct in6_addr *daddr); + const struct in6_addr *saddr, const struct in6_addr *daddr); void dhcpv6_init(const struct ctx *c);
#endif /* DHCPV6_H */ diff --git a/tap.c b/tap.c index eaa6111..59c45a3 100644 --- a/tap.c +++ b/tap.c @@ -161,21 +161,6 @@ void tap_send_single(const struct ctx *c, const void *data, size_t l2len) } }
-/** - * tap_ip6_daddr() - Normal IPv6 destination address for inbound packets - * @c: Execution context - * @src: Source address - * - * Return: pointer to IPv6 address - */ -const struct in6_addr *tap_ip6_daddr(const struct ctx *c, - const struct in6_addr *src) -{ - if (IN6_IS_ADDR_LINKLOCAL(src)) - return &c->ip6.addr_ll_seen; - return &c->ip6.addr_seen; -} -
Nice to see this ugly thing go away, though.
/** * tap_push_l2h() - Build an L2 header for an inbound packet * @c: Execution context diff --git a/tap.h b/tap.h index 07ca096..b335933 100644 --- a/tap.h +++ b/tap.h @@ -96,8 +96,6 @@ void tap_udp4_send(const struct ctx *c, struct in_addr src, in_port_t sport, const void *in, size_t dlen); void tap_icmp4_send(const struct ctx *c, struct in_addr src, struct in_addr dst, const void *in, const void *src_mac, size_t l4len); -const struct in6_addr *tap_ip6_daddr(const struct ctx *c, - const struct in6_addr *src); void *tap_push_ip6h(struct ipv6hdr *ip6h, const struct in6_addr *src, const struct in6_addr *dst, size_t l4len, uint8_t proto, uint32_t flow); -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:09PM -0400, Jon Maloy wrote:
We replace the fwd_guest_accessible4() and fwd_guest_accessible6() functions with a unified fwd_guest_accessible() function that handles both address families. With the unified address array, we can check all configured addresses in a single pass using for_each_addr() with family filter AF_UNSPEC.
Signed-off-by: Jon Maloy
Reviewed-by: David Gibson
--- v6: -Some fixes based on feedback from David Gibson v7: -Added curly brackets to for_each_addr() in fwd_guest_accessible(), as suggested by Stefano Brivio. --- fwd.c | 69 +++++++++++++---------------------------------------------- 1 file changed, 15 insertions(+), 54 deletions(-)
diff --git a/fwd.c b/fwd.c index 14ce0a7..e676c18 100644 --- a/fwd.c +++ b/fwd.c @@ -988,19 +988,19 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) }
/** - * fwd_guest_accessible4() - Is IPv4 address guest-accessible + * fwd_guest_accessible() - Is address guest-accessible * @c: Execution context - * @addr: Host visible IPv4 address + * @addr: Host visible address (IPv4 or IPv6) * * Return: true if @addr on the host is accessible to the guest without * translation, false otherwise */ -static bool fwd_guest_accessible4(const struct ctx *c, - const struct in_addr *addr) +static bool fwd_guest_accessible(const struct ctx *c, + const union inany_addr *addr) { const struct guest_addr *a;
- if (IN4_IS_ADDR_LOOPBACK(addr)) + if (inany_is_loopback(addr)) return false;
/* In socket interfaces 0.0.0.0 generally means "any" or unspecified, @@ -1008,38 +1008,18 @@ static bool fwd_guest_accessible4(const struct ctx *c, * that has a different meaning for host and guest, we can't let it * through untranslated. */ - if (IN4_IS_ADDR_UNSPECIFIED(addr)) + if (inany_is_unspecified(addr)) return false;
- /* For IPv4, addr_seen is initialised to addr, so is always a valid - * address + /* Check against all configured guest addresses */ + for_each_addr(a, c->addrs, c->addr_count, AF_UNSPEC) { + if (inany_equals(addr, &a->addr)) + return false; + } + /* Also check addr_seen: it tracks the address the guest is actually + * using, which may differ from configured addresses. */ - a = fwd_get_addr(c, AF_INET, 0, 0); - if ((a && IN4_ARE_ADDR_EQUAL(addr, inany_v4(&a->addr))) || - IN4_ARE_ADDR_EQUAL(addr, &c->ip4.addr_seen)) - return false; - - return true; -} - -/** - * fwd_guest_accessible6() - Is IPv6 address guest-accessible - * @c: Execution context - * @addr: Host visible IPv6 address - * - * Return: true if @addr on the host is accessible to the guest without - * translation, false otherwise - */ -static bool fwd_guest_accessible6(const struct ctx *c, - const struct in6_addr *addr) -{ - const struct guest_addr *a; - - if (IN6_IS_ADDR_LOOPBACK(addr)) - return false; - - a = fwd_get_addr(c, AF_INET6, 0, 0); - if (a && IN6_ARE_ADDR_EQUAL(addr, &a->addr.a6)) + if (inany_equals4(addr, &c->ip4.addr_seen)) return false;
/* For IPv6, addr_seen starts unspecified, because we don't know what LL @@ -1047,31 +1027,12 @@ static bool fwd_guest_accessible6(const struct ctx *c, * if it has been set to a real address. */ if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen) &&
Pre-existing nit: this test is unnecessary, we already checked that this address is not unspecified, so if addr_seen *is* unspecified, they can't be equal.
- IN6_ARE_ADDR_EQUAL(addr, &c->ip6.addr_seen)) + inany_equals6(addr, &c->ip6.addr_seen)) return false;
return true; }
-/** - * fwd_guest_accessible() - Is IPv[46] address guest-accessible - * @c: Execution context - * @addr: Host visible IPv[46] address - * - * Return: true if @addr on the host is accessible to the guest without - * translation, false otherwise - */ -static bool fwd_guest_accessible(const struct ctx *c, - const union inany_addr *addr) -{ - const struct in_addr *a4 = inany_v4(addr); - - if (a4) - return fwd_guest_accessible4(c, a4); - - return fwd_guest_accessible6(c, &addr->a6); -} - /** * nat_outbound() - Apply address translation for outbound (TAP to HOST) * @c: Execution context -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:11PM -0400, Jon Maloy wrote:
Allow specifying multiple addresses per family with -a/--address. The first address of each family is used for DHCP/DHCPv6 assignment.
Signed-off-by: Jon Maloy
--- v2: - Adapted to previous code changes v3: - Adapted to single-array strategy - Changes according to feedback from S. Brivio and D. Gibson. v4: - Stripped down and adapted after feedback from David G. v6: - Adapted to previous changes in series - Removed the "one address" limitation for -n option v7: - Updated man page. --- conf.c | 7 ++++--- fwd.c | 4 +--- passt.1 | 7 +++---- pasta.c | 14 ++++++++------ 4 files changed, 16 insertions(+), 16 deletions(-)
diff --git a/conf.c b/conf.c index 591f561..8b4a7a0 100644 --- a/conf.c +++ b/conf.c @@ -939,9 +939,11 @@ static void usage(const char *name, FILE *f, int status) " default: 65520: maximum 802.3 MTU minus 802.3 header\n" " length, rounded to 32 bits (IPv4 words)\n" " -a, --address ADDR Assign IPv4 or IPv6 address ADDR[/PREFIXLEN]\n" - " can be specified zero to two times (for IPv4 and IPv6)\n" + " can be specified up to a maximum of %d times\n" " default: use addresses from interface with default route\n" - " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n" + " -n, --netmask MASK Assign IPv4 MASK, dot-decimal or bits\n", + MAX_GUEST_ADDRS); + FPRINTF(f, " default: netmask from matching address on the host\n" " -M, --mac-addr ADDR Use source MAC address ADDR\n" " default: 9a:55:9a:55:9a:55 (locally administered)\n" @@ -1898,7 +1900,6 @@ void conf(struct ctx *c, int argc, char **argv) IN6_IS_ADDR_V4COMPAT(&addr.a6)) die("Invalid address: %s", optarg);
- /* Legacy behaviour: replace existing address if any */ fwd_set_addr(c, &addr, CONF_ADDR_USER, prefix_len); if (inany_v4(&addr)) c->ip4.no_copy_addrs = true; diff --git a/fwd.c b/fwd.c index e676c18..d3f576a 100644 --- a/fwd.c +++ b/fwd.c @@ -250,14 +250,12 @@ void fwd_neigh_table_init(const struct ctx *c) }
/** - * fwd_set_addr() - Add or update an address in the unified address array + * fwd_set_addr() - Update address entry, adding one if needed * @c: Execution context * @addr: Address to add (IPv4-mapped or IPv6) * @flags: CONF_ADDR_* flags for this address * @prefix_len: Prefix length in IPv6 or IPv4 format * - * Find the first existing entry of the same address family and - * overwrite it, or create a new one if none exists
Comment changes imply a change to the function's behaviour, but there's no corresponding change to the code. Did something shift into the wrong patch due to a bad rebase?
*/ void fwd_set_addr(struct ctx *c, const union inany_addr *addr, uint8_t flags, int prefix_len) diff --git a/passt.1 b/passt.1 index 13e8df9..12ec857 100644 --- a/passt.1 +++ b/passt.1 @@ -164,16 +164,13 @@ An optional /\fIprefix_len\fR (0-32 for IPv4, 0-128 for IPv6) can be appended in CIDR notation (e.g. 192.0.2.1/24). This is an alternative to using the \fB-n\fR, \fB--netmask\fR option. Mixing CIDR notation with \fB-n\fR results in an error. -If a prefix length is assigned to an IPv6 address using this method, it will -in the current code version be overridden by the default value of 64.
These two lines don't seem to match what this specific patch does.
-This option can be specified zero (for defaults) to two times (once for IPv4, -once for IPv6). By default, assigned IPv4 and IPv6 addresses are taken from the host interfaces with the first default route, if any, for the corresponding IP version. If no default routes are available and there is any interface with any route for a given IP version, the first of these interfaces will be chosen instead. If no such interface exists for a given IP version, the link-local address 169.254.2.1 is assigned for IPv4, and no additional address will be assigned for IPv6. +This option can be given multiple times, indicating multiple different addresses.
I'd suggest putting this new text in place of the removed "This option can be specified zero..." text, since it's covering the same detail of the option's behaviour.
.TP .BR \-n ", " \-\-netmask " " \fImask @@ -181,6 +178,8 @@ Assign IPv4 netmask \fImask\fR, expressed as dot-decimal or number of bits, via DHCP (option 1). Alternatively, the prefix length can be specified using CIDR notation with the \fB-a\fR, \fB--address\fR option (e.g. \fB-a\fR 192.0.2.1/24). Mixing \fB-n\fR with CIDR notation results in an error. +When indicated, this option sets the prefix length of the first configured +IPv4 address only.
This update also seems like it belongs in a different patch in the series.
If no address is indicated, the netmask associated with the adopted host address, if any, is used. If an address is indicated, but without a prefix length, the netmask is determined based on the corresponding network class. In all other diff --git a/pasta.c b/pasta.c index c51e4cd..b3936f5 100644 --- a/pasta.c +++ b/pasta.c @@ -343,14 +343,15 @@ void pasta_ns_conf(struct ctx *c)
if (c->ifi4) { if (c->ip4.no_copy_addrs) { - a = fwd_get_addr(c, AF_INET, 0, 0); - if (a) { + for_each_addr(a, c->addrs, c->addr_count, AF_INET) { plen = inany_prefix_len(&a->addr, a->prefix_len); rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, AF_INET, inany_v4(&a->addr), plen); + if (rc < 0) + break; } } else { rc = nl_addr_dup(nl_sock, c->ifi4, @@ -404,13 +405,14 @@ ipv4_done: 0, IFF_NOARP);
if (c->ip6.no_copy_addrs) { - a = fwd_get_addr(c, AF_INET6, 0, 0); - if (a) + for_each_addr(a, c->addrs, c->addr_count, AF_INET6) { rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, - AF_INET6, - &a->addr.a6, + AF_INET6, &a->addr.a6, a->prefix_len); + if (rc < 0) + break; + } } else { rc = nl_addr_dup(nl_sock, c->ifi6, nl_sock_ns, c->pasta_ifi, -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:13PM -0400, Jon Maloy wrote:
After the previous changes in this series it becomes possible to simplify the pasta_ns_conf() function.
We extract address and route configuration into helper functions pasta_conf_addrs() and pasta_conf_routes(), reducing nesting and improving readability.
To allow pasta_conf_addrs() to handle both address families uniformly, we change nl_addr_set() to take a union inany_addr pointer instead of void pointer, moving the address family handling into the function itself.
We also fix a bug where the IPv6 code path incorrectly wrote to req.set.a4.rta_l.rta_type instead of req.set.a6.rta_l.rta_type.
Signed-off-by: Jon Maloy
Reviewed-by: David Gibson
--- v7: -Removed redundant argument 'af' in nl_addr_set() -Removed redundant label and 'goto's in pasta_ns_conf() -Since I excluded addition of all LINKLOCAL addresses from host address array in a previous commit I can now omit this test in pasta_conf_addrs() as suggested by David. -Here is the example of agnostic usage of inany_prefix_len() I referred to in a previous commit. --- netlink.c | 17 ++-- netlink.h | 4 +- pasta.c | 232 ++++++++++++++++++++++++++---------------------------- 3 files changed, 123 insertions(+), 130 deletions(-)
diff --git a/netlink.c b/netlink.c index 3f5a812..3d25212 100644 --- a/netlink.c +++ b/netlink.c @@ -863,15 +863,15 @@ int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr) * nl_addr_set() - Set IP addresses for given interface and address family * @s: Netlink socket * @ifi: Interface index - * @af: Address family * @addr: Global address to set * @prefix_len: Mask or prefix length to set * * Return: 0 on success, negative error code on failure */ -int nl_addr_set(int s, unsigned int ifi, sa_family_t af, - const void *addr, int prefix_len) +int nl_addr_set(int s, unsigned int ifi, const union inany_addr *addr, + int prefix_len) { + sa_family_t af = inany_af(addr); struct req_t { struct nlmsghdr nlh; struct ifaddrmsg ifa; @@ -905,21 +905,22 @@ int nl_addr_set(int s, unsigned int ifi, sa_family_t af,
len = offsetof(struct req_t, set.a6) + sizeof(req.set.a6);
- memcpy(&req.set.a6.l, addr, sizeof(req.set.a6.l)); + memcpy(&req.set.a6.l, &addr->a6, sizeof(req.set.a6.l)); req.set.a6.rta_l.rta_len = rta_len; - req.set.a4.rta_l.rta_type = IFA_LOCAL; - memcpy(&req.set.a6.a, addr, sizeof(req.set.a6.a)); + req.set.a6.rta_l.rta_type = IFA_LOCAL; + memcpy(&req.set.a6.a, &addr->a6, sizeof(req.set.a6.a)); req.set.a6.rta_a.rta_len = rta_len; req.set.a6.rta_a.rta_type = IFA_ADDRESS; } else { + const struct in_addr *v4 = inany_v4(addr);
I suspect the static checkers won't be able to deduce that this cannot be NULL, because of the earlier inany_af() call - in a sense you're checking the address family twice. I think it would be more elegant (and avoid possible checker false positives) to have the if based on the return value from inany_v4(), then set req.ifa.ifa_family to constant values in each branch
size_t rta_len = RTA_LENGTH(sizeof(req.set.a4.l));
len = offsetof(struct req_t, set.a4) + sizeof(req.set.a4);
- memcpy(&req.set.a4.l, addr, sizeof(req.set.a4.l)); + memcpy(&req.set.a4.l, v4, sizeof(req.set.a4.l)); req.set.a4.rta_l.rta_len = rta_len; req.set.a4.rta_l.rta_type = IFA_LOCAL; - memcpy(&req.set.a4.a, addr, sizeof(req.set.a4.a)); + memcpy(&req.set.a4.a, v4, sizeof(req.set.a4.a)); req.set.a4.rta_a.rta_len = rta_len; req.set.a4.rta_a.rta_type = IFA_ADDRESS; } diff --git a/netlink.h b/netlink.h index 3af6d58..ec859b7 100644 --- a/netlink.h +++ b/netlink.h @@ -22,8 +22,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, int nl_addr_get_all(struct ctx *c, int s, unsigned int ifi, sa_family_t af); bool nl_neigh_mac_get(int s, const union inany_addr *addr, int ifi, unsigned char *mac); -int nl_addr_set(int s, unsigned int ifi, sa_family_t af, - const void *addr, int prefix_len); +int nl_addr_set(int s, unsigned int ifi, const union inany_addr *addr, + int prefix_len); int nl_addr_get_ll(int s, unsigned int ifi, struct in6_addr *addr); int nl_addr_set_ll_nodad(int s, unsigned int ifi); int nl_addr_dup(int s_src, unsigned int ifi_src, diff --git a/pasta.c b/pasta.c index b3936f5..f949dc2 100644 --- a/pasta.c +++ b/pasta.c @@ -46,6 +46,7 @@
#include "util.h" #include "passt.h" +#include "conf.h" #include "isolation.h" #include "netlink.h" #include "log.h" @@ -303,13 +304,69 @@ void pasta_start_ns(struct ctx *c, uid_t uid, gid_t gid, die_perror("Failed to join network namespace"); }
+/** + * pasta_conf_addrs() - Configure addresses for one address family in namespace + * @c: Execution context + * @af: Address family (AF_INET or AF_INET6) + * @ifi: Host interface index for this address family + * @no_copy: If true, set addresses from c->addrs; if false, copy from host + * + * Return: 0 on success, negative error code on failure + */ +static int pasta_conf_addrs(struct ctx *c, sa_family_t af, + int ifi, bool no_copy) +{ + const struct guest_addr *a; + + if (!ifi) + return 0; + + if (!no_copy) + return nl_addr_dup(nl_sock, ifi, nl_sock_ns, c->pasta_ifi, af); + + for_each_addr(a, c->addrs, c->addr_count, af) { + int rc; + + rc = nl_addr_set(nl_sock_ns, c->pasta_ifi, &a->addr, + inany_prefix_len(&a->addr, a->prefix_len)); + if (rc < 0) + return rc; + } + return 0; +} + +/** + * pasta_conf_routes() - Configure routes for one address family in namespace + * @c: Execution context + * @af: Address family (AF_INET or AF_INET6) + * @ifi: Host interface index for this address family + * @no_copy: If true, set default route; if false, copy routes from host + * + * Return: 0 on success, negative error code on failure + */ +static int pasta_conf_routes(struct ctx *c, sa_family_t af, int ifi, + bool no_copy) +{ + const void *gw = (af == AF_INET) ? + (const void *)&c->ip4.guest_gw : (const void *)&c->ip6.guest_gw; + + if (!ifi) + return 0; + + if (no_copy)
It'd be nicer to have the !no_copy branch first for consistency with pasta_conf_addrs().
+ return nl_route_set_def(nl_sock_ns, c->pasta_ifi, af, gw); + + return nl_route_dup(nl_sock, ifi, nl_sock_ns, c->pasta_ifi, af); +} + /** * pasta_ns_conf() - Set up loopback and tap interfaces in namespace as needed * @c: Execution context */ void pasta_ns_conf(struct ctx *c) { - int rc = 0; + unsigned int flags = IFF_UP; + int rc;
rc = nl_link_set_flags(nl_sock_ns, 1 /* lo */, IFF_UP, IFF_UP); if (rc < 0) @@ -328,127 +385,62 @@ void pasta_ns_conf(struct ctx *c) die("Couldn't set MAC address in namespace: %s", strerror_(-rc));
- if (c->pasta_conf_ns) { - unsigned int flags = IFF_UP; - const struct guest_addr *a; - int plen; - - if (c->mtu) - nl_link_set_mtu(nl_sock_ns, c->pasta_ifi, c->mtu); - - if (c->ifi6) /* Avoid duplicate address detection on link up */ - flags |= IFF_NOARP; - - nl_link_set_flags(nl_sock_ns, c->pasta_ifi, flags, flags); - - if (c->ifi4) { - if (c->ip4.no_copy_addrs) { - for_each_addr(a, c->addrs, c->addr_count, AF_INET) { - plen = inany_prefix_len(&a->addr, - a->prefix_len); - rc = nl_addr_set(nl_sock_ns, - c->pasta_ifi, AF_INET, - inany_v4(&a->addr), - plen); - if (rc < 0) - break; - } - } else { - rc = nl_addr_dup(nl_sock, c->ifi4, - nl_sock_ns, c->pasta_ifi, - AF_INET); - } - - if (c->ifi4 == -1 && rc == -ENOTSUP) { - warn("IPv4 not supported, disabling"); - c->ifi4 = 0; - goto ipv4_done; - } - - if (rc < 0) { - die("Couldn't set IPv4 address(es) in namespace: %s", - strerror_(-rc)); - } - - if (c->ip4.no_copy_routes) { - rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, - AF_INET, - &c->ip4.guest_gw); - } else { - rc = nl_route_dup(nl_sock, c->ifi4, nl_sock_ns, - c->pasta_ifi, AF_INET); - } - - if (rc < 0) { - die("Couldn't set IPv4 route(s) in guest: %s", - strerror_(-rc)); - } - } -ipv4_done: - - if (c->ifi6) { - rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, - &c->ip6.addr_ll_seen); - if (rc < 0) { - warn("Can't get LL address from namespace: %s", - strerror_(-rc)); - } - - rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); - if (rc < 0) { - warn("Can't set nodad for LL in namespace: %s", - strerror_(-rc)); - } - - /* We dodged DAD: re-enable neighbour solicitations */ - nl_link_set_flags(nl_sock_ns, c->pasta_ifi, - 0, IFF_NOARP); - - if (c->ip6.no_copy_addrs) { - for_each_addr(a, c->addrs, c->addr_count, AF_INET6) { - rc = nl_addr_set(nl_sock_ns, - c->pasta_ifi, - AF_INET6, &a->addr.a6, - a->prefix_len); - if (rc < 0) - break; - } - } else { - rc = nl_addr_dup(nl_sock, c->ifi6, - nl_sock_ns, c->pasta_ifi, - AF_INET6); - } - - if (rc < 0) { - die("Couldn't set IPv6 address(es) in namespace: %s", - strerror_(-rc)); - } - - if (c->ip6.no_copy_routes) { - rc = nl_route_set_def(nl_sock_ns, c->pasta_ifi, - AF_INET6, - &c->ip6.guest_gw); - } else { - rc = nl_route_dup(nl_sock, c->ifi6, - nl_sock_ns, c->pasta_ifi, - AF_INET6); - } - - if (c->ifi6 == -1 && rc == -ENOTSUP) { - warn("IPv6 not supported, disabling"); - c->ifi6 = 0; - goto ipv6_done; - } - - if (rc < 0) { - die("Couldn't set IPv6 route(s) in guest: %s", - strerror_(-rc)); - } - } + proto_update_l2_buf(c->guest_mac); + + if (!c->pasta_conf_ns) + return; + + if (c->mtu) + nl_link_set_mtu(nl_sock_ns, c->pasta_ifi, c->mtu); + + if (c->ifi6) /* Avoid duplicate address detection on link up */ + flags |= IFF_NOARP;
Pre-existing, but this looks weird. Why are we setting a property for IPv6 next to the IPv4 configuration code? And does IFF_NOARP actually affect IPv6 DAD? I thought that was the IFA_F_NODAD flag which we use elsewhere.
+ + nl_link_set_flags(nl_sock_ns, c->pasta_ifi, flags, flags); + + /* IPv4 configuration */ + rc = pasta_conf_addrs(c, AF_INET, c->ifi4, c->ip4.no_copy_addrs); + if (c->ifi4 == -1 && rc == -ENOTSUP) { + warn("IPv4 not supported, disabling"); + c->ifi4 = 0; + } else if (rc < 0) { + die("Couldn't set IPv4 address(es): %s", strerror_(-rc)); + } else if (c->ifi4) { + rc = pasta_conf_routes(c, AF_INET, c->ifi4, + c->ip4.no_copy_routes); + if (rc < 0) + die("Couldn't set IPv4 route(s): %s", strerror_(-rc)); } -ipv6_done:
- proto_update_l2_buf(c->guest_mac); + if (!c->ifi6) + return; + + rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, + &c->ip6.addr_ll_seen); + if (rc < 0) + warn("Can't get LL address from namespace: %s", + strerror_(-rc)); + + rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); + if (rc < 0) + warn("Can't set nodad for LL in namespace: %s", + strerror_(-rc)); + + /* We dodged DAD: re-enable neighbour solicitations */ + nl_link_set_flags(nl_sock_ns, c->pasta_ifi, 0, IFF_NOARP); + + rc = pasta_conf_addrs(c, AF_INET6, c->ifi6, c->ip6.no_copy_addrs); + if (c->ifi6 == -1 && rc == -ENOTSUP) { + warn("IPv6 not supported, disabling"); + c->ifi6 = 0; + } else if (rc < 0) { + die("Couldn't set IPv6 address(es): %s", strerror_(-rc)); + } else { + rc = pasta_conf_routes(c, AF_INET6, c->ifi6, + c->ip6.no_copy_routes); + if (rc < 0) + die("Couldn't set IPv6 route(s): %s", strerror_(-rc)); + } }
/** -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:14PM -0400, Jon Maloy wrote:
We remove the addr_seen field in struct ip4_ctx and replace it by setting a new CONF_ADDR_OBSERVED flag in the corresponding entry in the unified address array.
The observed IPv4 address is always added at or moved to position 0, increasing chances for a fast lookup.
Signed-off-by: Jon Maloy
--- v4: - Removed migration protocol update, to be added in later commit - Allow only one OBSERVED address at a time - Some other changes based on feedback from David G v5: - Allowing multiple observed IPv4 addresses v6: - Refactored fwd_set_addr(), notably: o Limited number of allowed observed addresses to four per protocol o I kept the memmove() calls, since I find no more elegant way to do this. Performance cost should be minimal, since these parts of the code will execute only very exceptionally. Note that removing the 'oldest' entry implicitly means removing the least used one, since the latter will migrate to the highest position after a few iterations of remove/add. o Also kept the prefix_len update. Not sure about this, but I cannot see how the current approach can cause any harm. - Other changes suggested by David G, notably reversing some residues after an accidental merge/re-split with the next commit. v7: - Changed fwd_set_addr() to only accept keeping one observed-only address per protocol, as suggested by David. - Eliminated redundant tap_check_src_addr4() call level. - I keep fwd_select_addr() for the same pragmatic reason it was introduced: to avoid ugly, deeply indented code that tends to wrap across several lines. --- conf.c | 6 --- fwd.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++------- fwd.h | 4 ++ migrate.c | 17 +++++++- passt.h | 6 +-- tap.c | 8 +++- 6 files changed, 136 insertions(+), 29 deletions(-)
diff --git a/conf.c b/conf.c index 924ade2..f503d0f 100644 --- a/conf.c +++ b/conf.c @@ -767,13 +767,8 @@ static unsigned int conf_ip4(struct ctx *c, unsigned int ifi) } if (!rc || !fwd_get_addr(c, AF_INET, 0, 0)) return 0; - - a = fwd_get_addr(c, AF_INET, CONF_ADDR_HOST, 0); }
- if (a) - ip4->addr_seen = *inany_v4(&a->addr); - ip4->our_tap_addr = ip4->guest_gw;
return ifi; @@ -787,7 +782,6 @@ static void conf_ip4_local(struct ctx *c) { struct ip4_ctx *ip4 = &c->ip4;
- ip4->addr_seen = IP4_LL_GUEST_ADDR; ip4->our_tap_addr = ip4->guest_gw = IP4_LL_GUEST_GW; ip4->no_copy_addrs = ip4->no_copy_routes = true; fwd_set_addr(c, &inany_from_v4(IP4_LL_GUEST_ADDR), diff --git a/fwd.c b/fwd.c index d3f576a..8c7bf91 100644 --- a/fwd.c +++ b/fwd.c @@ -28,6 +28,7 @@ #include "inany.h" #include "fwd.h" #include "passt.h" +#include "conf.h" #include "lineread.h" #include "flow_table.h" #include "netlink.h" @@ -260,21 +261,68 @@ void fwd_neigh_table_init(const struct ctx *c) void fwd_set_addr(struct ctx *c, const union inany_addr *addr, uint8_t flags, int prefix_len) { - struct guest_addr *a; + struct guest_addr *a, *arr = &c->addrs[0], *rm = NULL; + int count = c->addr_count; + int af_cnt = 0;
- for_each_addr(a, c->addrs, c->addr_count, inany_af(addr)) { - goto found; + for_each_addr(a, c->addrs, c->addr_count, AF_UNSPEC) { + if (!inany_equals(&a->addr, addr)) + continue; + + /* Adjust and update prefix_len if provided and applicable */ + if (prefix_len && !(a->flags & CONF_ADDR_USER)) + a->prefix_len = inany_prefix_len(addr, prefix_len);
Converting the format of the prefix length here doesn't make sense to me. Both addr and a->addr are inanys, so both prefix_len and a->prefix_len should already be in IPv6 format.
+ + /* Nothing more to change */ + if ((a->flags & flags) == flags) + return; + + a->flags |= flags; + if (!(flags & CONF_ADDR_OBSERVED)) + return; + + /* Observed address moves to position 0: remove, re-add later */ + prefix_len = a->prefix_len; + memmove(a, a + 1, (&arr[count] - (a + 1)) * sizeof(*a));
If the address is already in an early slot, this will move most of the table, and the code below will move it mostly back again. That seems not ideal.
+ c->addr_count = --count;
Having the local count shadown c->addr_count leads to kind of confusing resynchronization like this. I'd be inclined to just use c->addr_count directly, and rely on the compiler to optimise it.
+ break; }
- if (c->addr_count >= MAX_GUEST_ADDRS) + if (count >= MAX_GUEST_ADDRS) { + debug("Address table full, can't add address"); return; + }
- a = &c->addrs[c->addr_count++]; - -found: + /* Add to head or tail, depending on flag */ + if (flags & CONF_ADDR_OBSERVED) { + a = &arr[0]; + memmove(&arr[1], a, count * sizeof(*a)); + } else { + a = &arr[count]; + } + c->addr_count = ++count; a->addr = *addr; a->prefix_len = inany_prefix_len6(addr, prefix_len);
Again, a conversion should not be necessary here. Much less one that's different from the case above.
a->flags = flags; + + if (!(flags & CONF_ADDR_OBSERVED)) + return; + + /* Remove excess observed-only address if more than one */ + for (int i = count - 1; i >= 0; i--) { + a = &arr[i]; + if (inany_af(&a->addr) != inany_af(addr)) + continue; + if (a->flags != CONF_ADDR_OBSERVED) + continue; + if (!rm) + rm = a; + af_cnt++; + }
As we've discussed, an address should only be removed if it is *observed only*. If there's an existing, different address with OBSERVED | USER, for example, we should just remove its OBSERVED bit but leave the address itself.
+ if (af_cnt > 1) { + memmove(rm, rm + 1, (&arr[count] - (rm + 1)) * sizeof(*rm)); + c->addr_count--; + } }
/** @@ -985,6 +1033,38 @@ static bool is_dns_flow(uint8_t proto, const struct flowside *ini) ((ini->oport == 53) || (ini->oport == 853)); }
+/** + * fwd_select_addr() - Select address with priority-based search + * @c: Execution context + * @af: Address family (AF_INET or AF_INET6) + * @primary: Primary flags to match (or 0 to skip) + * @secondary: Secondary flags to match (or 0 to skip) + * @skip: Flags to exclude from search + * + * Search for address entries in priority order. + * + * Return: pointer to selected address entry, or NULL if none found + */ +const struct guest_addr *fwd_select_addr(const struct ctx *c, int af, + int primary, int secondary, int skip) +{ + const struct guest_addr *a; + + if (primary) {
Why is it useful to allow skipping the primary search?
+ a = fwd_get_addr(c, af, primary, skip); + if (a) + return a; + } + + if (secondary) { + a = fwd_get_addr(c, af, secondary, skip); + if (a) + return a; + } + + return NULL; +} + /** * fwd_guest_accessible() - Is address guest-accessible * @c: Execution context @@ -1014,11 +1094,6 @@ static bool fwd_guest_accessible(const struct ctx *c, if (inany_equals(addr, &a->addr)) return false; } - /* Also check addr_seen: it tracks the address the guest is actually - * using, which may differ from configured addresses. - */ - if (inany_equals4(addr, &c->ip4.addr_seen)) - return false;
/* For IPv6, addr_seen starts unspecified, because we don't know what LL * address the guest will take until we see it. Only check against it @@ -1214,10 +1289,20 @@ uint8_t fwd_nat_from_host(const struct ctx *c, * match. */ if (inany_v4(&ini->eaddr)) { - if (c->host_lo_to_ns_lo) + if (c->host_lo_to_ns_lo) { tgt->eaddr = inany_loopback4; - else - tgt->eaddr = inany_from_v4(c->ip4.addr_seen); + } else { + const struct guest_addr *a; + + a = fwd_select_addr(c, AF_INET, + CONF_ADDR_OBSERVED, + CONF_ADDR_USER | + CONF_ADDR_HOST, 0);
Isn't the two stage lookup redundant with forcing OBSERVED addresses into the first slot? If you just search for any address of the right family, it will necessarily find the OBSERVED one first, yes?
+ if (!a) + return PIF_NONE; + + tgt->eaddr = a->addr; + } tgt->oaddr = inany_any4; } else { if (c->host_lo_to_ns_lo) @@ -1252,7 +1337,14 @@ uint8_t fwd_nat_from_host(const struct ctx *c, tgt->oport = ini->eport;
if (inany_v4(&tgt->oaddr)) { - tgt->eaddr = inany_from_v4(c->ip4.addr_seen); + const struct guest_addr *a; + + a = fwd_select_addr(c, AF_INET, CONF_ADDR_OBSERVED, + CONF_ADDR_USER | CONF_ADDR_HOST, 0); + if (!a) + return PIF_NONE; + + tgt->eaddr = a->addr; } else { if (inany_is_linklocal6(&tgt->oaddr)) tgt->eaddr.a6 = c->ip6.addr_ll_seen; diff --git a/fwd.h b/fwd.h index c5a1068..9893856 100644 --- a/fwd.h +++ b/fwd.h @@ -25,6 +25,10 @@ void fwd_probe_ephemeral(void); bool fwd_port_is_ephemeral(in_port_t port); const struct guest_addr *fwd_get_addr(const struct ctx *c, sa_family_t af, uint8_t incl, uint8_t excl); +const struct guest_addr *fwd_select_addr(const struct ctx *c, int af, + int primary, int secondary, int skip); +void fwd_set_addr(struct ctx *c, const union inany_addr *addr, + uint8_t flags, int prefix_len);
/** * struct fwd_rule - Forwarding rule governing a range of ports diff --git a/migrate.c b/migrate.c index 1e8858a..1e02720 100644 --- a/migrate.c +++ b/migrate.c @@ -18,6 +18,8 @@ #include "util.h" #include "ip.h" #include "passt.h" +#include "conf.h" +#include "fwd.h" #include "inany.h" #include "flow.h" #include "flow_table.h" @@ -57,11 +59,18 @@ static int seen_addrs_source_v2(struct ctx *c, struct migrate_seen_addrs_v2 addrs = { .addr6 = c->ip6.addr_seen, .addr6_ll = c->ip6.addr_ll_seen, - .addr4 = c->ip4.addr_seen, }; + const struct guest_addr *a;
(void)stage;
+ /* IPv4 observed address, with fallback to configured address */ + a = fwd_select_addr(c, AF_INET, CONF_ADDR_OBSERVED, + CONF_ADDR_USER | CONF_ADDR_HOST, + CONF_ADDR_LINKLOCAL);
I don't think we want to exclude LINKLOCAL here. In the (unlikely for IPv4) case that addr_seen was linklocal, we would have transported it as is previously.
+ if (a) + addrs.addr4 = *inany_v4(&a->addr); + memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac));
if (write_all_buf(fd, &addrs, sizeof(addrs))) @@ -90,7 +99,11 @@ static int seen_addrs_target_v2(struct ctx *c,
c->ip6.addr_seen = addrs.addr6; c->ip6.addr_ll_seen = addrs.addr6_ll; - c->ip4.addr_seen = addrs.addr4; + + if (addrs.addr4.s_addr)
Use IN4_IS_ADDR_UNSPECIFIED, rather than looking into the s_addr field, please.
+ fwd_set_addr(c, &inany_from_v4(addrs.addr4), + CONF_ADDR_OBSERVED, 0); + memcpy(c->guest_mac, addrs.mac, sizeof(c->guest_mac));
return 0; diff --git a/passt.h b/passt.h index f75656d..5da1d55 100644 --- a/passt.h +++ b/passt.h @@ -64,8 +64,9 @@ enum passt_modes { MODE_VU, };
-/* Maximum number of addresses in context address array */ +/* Limits on number of addresses in context address array */ #define MAX_GUEST_ADDRS 32 +#define MAX_OBSERVED_ADDRS 4
Leftover from earlier versions?
/** * struct guest_addr - Unified IPv4/IPv6 address entry @@ -81,11 +82,11 @@ struct guest_addr { #define CONF_ADDR_HOST BIT(1) /* From host interface */ #define CONF_ADDR_GENERATED BIT(2) /* Generated by PASST/PASTA */ #define CONF_ADDR_LINKLOCAL BIT(3) /* Link-local address */ +#define CONF_ADDR_OBSERVED BIT(4) /* Seen in guest traffic */ };
/** * struct ip4_ctx - IPv4 execution context - * @addr_seen: Latest IPv4 address seen as source from tap * @guest_gw: IPv4 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's 127.0.0.1 @@ -101,7 +102,6 @@ struct guest_addr { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip4_ctx { - struct in_addr addr_seen; struct in_addr guest_gw; struct in_addr map_host_loopback; struct in_addr map_guest_addr; diff --git a/tap.c b/tap.c index eb93f74..7f04e12 100644 --- a/tap.c +++ b/tap.c @@ -47,6 +47,7 @@ #include "ip.h" #include "iov.h" #include "passt.h" +#include "fwd.h" #include "arp.h" #include "dhcp.h" #include "ndp.h" @@ -756,9 +757,12 @@ resume: continue; }
- if (iph->saddr && c->ip4.addr_seen.s_addr != iph->saddr) - c->ip4.addr_seen.s_addr = iph->saddr; + if (iph->saddr) { + const union inany_addr *addr;
+ addr = &inany_from_v4(*(struct in_addr *) &iph->saddr); + fwd_set_addr(c, addr, CONF_ADDR_OBSERVED, 0);
This is called on essentially every packet. I'm a bit concerned that even with the optimisations already implemented this might get noticeably expensive.
+ } if (!iov_drop_header(&data, hlen)) continue; if (iov_tail_size(&data) != l4len) -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:15PM -0400, Jon Maloy wrote:
We remove the addr_seen and addr_ll_seen fields in struct ip6_ctx and replace them by setting CONF_ADDR_OBSERVED and CONF_ADDR_LINKLOCAL flags in the corresponding entry in the unified address array.
The observed IPv6 address is always added/moved to position 0 in the array, improving chances for fast lookup.
The separate check against addr_seen in fwd_guest_accessible() can now be removed because the observed address is now in the unified array, and the existing for_each_addr() loop already checks against all addresses, including this one.
This completes the unification of address storage for both IPv4 and IPv6, enabling future support for multiple guest addresses per family.
Signed-off-by: Jon Maloy
--- v5: - Made to use same algorithm and function as IPv4 for inserting observed into the array.
v6: - Re-introduced code that by accident had been moved to the previous commit. - Some fixes based on feedback from David G.
v7: - Added a commit at the beginning of the series addressing Stefano's concern about DHCPv6 reply addresses. - Some other updates based on feedback from David and Stefano. --- conf.c | 4 ---- dhcpv6.c | 4 +--- fwd.c | 38 +++++++++++++++++++++++--------------- inany.h | 3 +++ migrate.c | 37 +++++++++++++++++++++++++++---------- passt.h | 4 ---- pasta.c | 11 +++++++---- tap.c | 17 +++++------------ 8 files changed, 66 insertions(+), 52 deletions(-)
diff --git a/conf.c b/conf.c index f503d0f..3cb3553 100644 --- a/conf.c +++ b/conf.c @@ -827,7 +827,6 @@ static unsigned int conf_ip6(struct ctx *c, unsigned int ifi) strerror_(-rc)); return 0; } - a = fwd_get_addr(c, AF_INET6, CONF_ADDR_HOST, 0); } else { rc = nl_addr_get_ll(nl_sock, ifi, &ip6->our_tap_ll); if (rc < 0) { @@ -836,9 +835,6 @@ static unsigned int conf_ip6(struct ctx *c, unsigned int ifi) } }
- if (a) - ip6->addr_seen = a->addr.a6; - if (IN6_IS_ADDR_LINKLOCAL(&ip6->guest_gw)) ip6->our_tap_ll = ip6->guest_gw;
diff --git a/dhcpv6.c b/dhcpv6.c index 0a064a9..447aaba 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -567,8 +567,8 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, struct opt_hdr client_id_storage; /* cppcheck-suppress [variableScope,unmatchedSuppression] */ struct opt_ia_na ia_storage; - const struct guest_addr *a; const struct in6_addr *src; + const struct guest_addr *a; struct msg_hdr mh_storage; const struct msg_hdr *mh; struct udphdr uh_storage; @@ -683,8 +683,6 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, resp.hdr.xid = mh->xid;
tap_udp6_send(c, src, 547, saddr, 546, mh->xid, &resp, n); - if (a) - c->ip6.addr_seen = a->addr.a6;
return 1; } diff --git a/fwd.c b/fwd.c index 8c7bf91..b177be9 100644 --- a/fwd.c +++ b/fwd.c @@ -1095,14 +1095,6 @@ static bool fwd_guest_accessible(const struct ctx *c, return false; }
- /* For IPv6, addr_seen starts unspecified, because we don't know what LL - * address the guest will take until we see it. Only check against it - * if it has been set to a real address. - */ - if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen) && - inany_equals6(addr, &c->ip6.addr_seen)) - return false; - return true; }
@@ -1305,10 +1297,20 @@ uint8_t fwd_nat_from_host(const struct ctx *c, } tgt->oaddr = inany_any4; } else { - if (c->host_lo_to_ns_lo) + if (c->host_lo_to_ns_lo) { tgt->eaddr = inany_loopback6; - else - tgt->eaddr.a6 = c->ip6.addr_seen; + } else { + const struct guest_addr *a; + + a = fwd_select_addr(c, AF_INET6, + CONF_ADDR_OBSERVED, + CONF_ADDR_USER | + CONF_ADDR_HOST, + CONF_ADDR_LINKLOCAL);
As for IPv4, do we actually need the two-phase search, or is the ordering of addresses in the array enough?
+ if (!a) + return PIF_NONE; + tgt->eaddr = a->addr; + } tgt->oaddr = inany_any6; }
@@ -1346,10 +1348,16 @@ uint8_t fwd_nat_from_host(const struct ctx *c,
tgt->eaddr = a->addr; } else { - if (inany_is_linklocal6(&tgt->oaddr)) - tgt->eaddr.a6 = c->ip6.addr_ll_seen; - else - tgt->eaddr.a6 = c->ip6.addr_seen; + bool ll = inany_is_linklocal6(&tgt->oaddr); + uint8_t excl = ll ? ~CONF_ADDR_LINKLOCAL : CONF_ADDR_LINKLOCAL;
Uuhhh... I recall Stefano suggested using this encoding technique of using ~flag to indicate a different meaning. But AFACT handling that encoding is not actually implemented in fwd_get_addr() in this series. Plus, that encoding technique doesn't really work any more if you allow multiple bits in the excl fields.
+ const struct guest_addr *a; + + a = fwd_select_addr(c, AF_INET6, CONF_ADDR_OBSERVED, + CONF_ADDR_USER | CONF_ADDR_HOST, excl); + if (!a) + return PIF_NONE; + + tgt->eaddr = a->addr; }
return PIF_TAP; diff --git a/inany.h b/inany.h index 0450c45..ddcf93d 100644 --- a/inany.h +++ b/inany.h @@ -60,6 +60,9 @@ extern const union inany_addr inany_any4; #define inany_from_v4(a4) \ ((union inany_addr)INANY_INIT4((a4)))
+#define inany_from_v6(v6) \ + ((union inany_addr){ .a6 = (v6) }) + /** union sockaddr_inany - Either a sockaddr_in or a sockaddr_in6 * @sa_family: Address family, AF_INET or AF_INET6 * @sa: Plain struct sockaddr (useful to avoid casts) diff --git a/migrate.c b/migrate.c index 1e02720..2dc4dd9 100644 --- a/migrate.c +++ b/migrate.c @@ -56,21 +56,30 @@ struct migrate_seen_addrs_v2 { static int seen_addrs_source_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { - struct migrate_seen_addrs_v2 addrs = { - .addr6 = c->ip6.addr_seen, - .addr6_ll = c->ip6.addr_ll_seen, - }; + struct migrate_seen_addrs_v2 addrs = { 0 }; const struct guest_addr *a;
(void)stage;
- /* IPv4 observed address, with fallback to configured address */ + /* IPv4 observed address, with fallback to any other non-LL address */
Seems like this change belongs in the previous patch.
a = fwd_select_addr(c, AF_INET, CONF_ADDR_OBSERVED, CONF_ADDR_USER | CONF_ADDR_HOST, CONF_ADDR_LINKLOCAL); if (a) addrs.addr4 = *inany_v4(&a->addr);
+ /* IPv6 observed address, with fallback to any other non-LL address */ + a = fwd_select_addr(c, AF_INET6, CONF_ADDR_OBSERVED, + CONF_ADDR_USER | CONF_ADDR_HOST, + CONF_ADDR_LINKLOCAL); + if (a) + addrs.addr6 = a->addr.a6; + + /* IPv6 link-local address */ + a = fwd_get_addr(c, AF_INET6, CONF_ADDR_LINKLOCAL, 0); + if (a) + addrs.addr6_ll = a->addr.a6; + memcpy(addrs.mac, c->guest_mac, sizeof(addrs.mac));
if (write_all_buf(fd, &addrs, sizeof(addrs))) @@ -91,19 +100,27 @@ static int seen_addrs_target_v2(struct ctx *c, const struct migrate_stage *stage, int fd) { struct migrate_seen_addrs_v2 addrs; + struct in6_addr addr6, addr6_ll;
(void)stage;
if (read_all_buf(fd, &addrs, sizeof(addrs))) return errno;
- c->ip6.addr_seen = addrs.addr6; - c->ip6.addr_ll_seen = addrs.addr6_ll; - - if (addrs.addr4.s_addr) + if (addrs.addr4.s_addr) { fwd_set_addr(c, &inany_from_v4(addrs.addr4), CONF_ADDR_OBSERVED, 0); - + }
Another change that belongs in the previous patch.
+ addr6 = addrs.addr6; + if (!IN6_IS_ADDR_UNSPECIFIED(&addr6)) { + fwd_set_addr(c, &inany_from_v6(addr6), + CONF_ADDR_OBSERVED, 0); + } + addr6_ll = addrs.addr6_ll; + if (!IN6_IS_ADDR_UNSPECIFIED(&addr6_ll)) { + fwd_set_addr(c, &inany_from_v6(addr6_ll), + CONF_ADDR_OBSERVED | CONF_ADDR_LINKLOCAL, 0); + } memcpy(c->guest_mac, addrs.mac, sizeof(c->guest_mac));
return 0; diff --git a/passt.h b/passt.h index 5da1d55..3ef84eb 100644 --- a/passt.h +++ b/passt.h @@ -121,8 +121,6 @@ struct ip4_ctx {
/** * struct ip6_ctx - IPv6 execution context - * @addr_seen: Latest IPv6 global/site address seen as source from tap - * @addr_ll_seen: Latest IPv6 link-local address seen as source from tap * @guest_gw: IPv6 gateway as seen by the guest * @map_host_loopback: Outbound connections to this address are NATted to the * host's [::1] @@ -138,8 +136,6 @@ struct ip4_ctx { * @no_copy_addrs: Don't copy all addresses when configuring namespace */ struct ip6_ctx { - struct in6_addr addr_seen; - struct in6_addr addr_ll_seen; struct in6_addr guest_gw; struct in6_addr map_host_loopback; struct in6_addr map_guest_addr; diff --git a/pasta.c b/pasta.c index f949dc2..f7f1e07 100644 --- a/pasta.c +++ b/pasta.c @@ -366,6 +366,7 @@ static int pasta_conf_routes(struct ctx *c, sa_family_t af, int ifi, void pasta_ns_conf(struct ctx *c) { unsigned int flags = IFF_UP; + struct in6_addr addr_ll; int rc;
rc = nl_link_set_flags(nl_sock_ns, 1 /* lo */, IFF_UP, IFF_UP); @@ -415,12 +416,14 @@ void pasta_ns_conf(struct ctx *c) if (!c->ifi6) return;
- rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, - &c->ip6.addr_ll_seen); - if (rc < 0) + rc = nl_addr_get_ll(nl_sock_ns, c->pasta_ifi, &addr_ll); + if (rc < 0) { warn("Can't get LL address from namespace: %s", strerror_(-rc)); - + } else { + fwd_set_addr(c, &inany_from_v6(addr_ll), + CONF_ADDR_LINKLOCAL | CONF_ADDR_OBSERVED, 0); + } rc = nl_addr_set_ll_nodad(nl_sock_ns, c->pasta_ifi); if (rc < 0) warn("Can't set nodad for LL in namespace: %s", diff --git a/tap.c b/tap.c index 7f04e12..7f7f0ce 100644 --- a/tap.c +++ b/tap.c @@ -933,20 +933,13 @@ resume: continue; }
- if (IN6_IS_ADDR_LINKLOCAL(saddr)) { - c->ip6.addr_ll_seen = *saddr; + if (!IN6_IS_ADDR_UNSPECIFIED(saddr)) { + uint8_t flags = CONF_ADDR_OBSERVED;
- if (IN6_IS_ADDR_UNSPECIFIED(&c->ip6.addr_seen)) { - c->ip6.addr_seen = *saddr; - } - - if (!fwd_get_addr(c, AF_INET6, 0, 0)) { - union inany_addr addr = { .a6 = *saddr }; + if (IN6_IS_ADDR_LINKLOCAL(saddr)) + flags |= CONF_ADDR_LINKLOCAL;
- fwd_set_addr(c, &addr, CONF_ADDR_LINKLOCAL, 64); - } - } else if (!IN6_IS_ADDR_UNSPECIFIED(saddr)){ - c->ip6.addr_seen = *saddr; + fwd_set_addr(c, &inany_from_v6(*saddr), flags, 0); }
if (proto == IPPROTO_ICMPV6) { -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:16PM -0400, Jon Maloy wrote:
We update the migration protocol to version 3 to support distributing multiple addresses from the unified address array. The new protocol migrates all address entries in the array, along with their prefix lengths and flags, and leaves it to the receiver to filter which ones he wants to apply.
Signed-off-by: Jon Maloy
--- v4: - Broke out as separate commit - Made number of transferable addresses variable
v6: - Separated internal and wire transfer format
v7: - Using uint32_t instead of uint8_t for fields in migration format - Replaced term "wire format" with "migration format" - Some other minor changes after feedback from Stefano. --- migrate.c | 179 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 179 insertions(+)
diff --git a/migrate.c b/migrate.c index 2dc4dd9..93f67ae 100644 --- a/migrate.c +++ b/migrate.c @@ -44,6 +44,71 @@ struct migrate_seen_addrs_v2 { unsigned char mac[ETH_ALEN]; } __attribute__((packed));
+/** + * Migration format flags for address migration (v3) + * These are stable values - do not change existing assignments + */ +#define MIGRATE_ADDR_USER BIT(0) +#define MIGRATE_ADDR_HOST BIT(1) +#define MIGRATE_ADDR_LINKLOCAL BIT(2) +#define MIGRATE_ADDR_OBSERVED BIT(3) + +/** + * struct migrate_addr_v3 - Migration format for a single address entry + * @addr: IPv6 or IPv4-mapped address (16 bytes) + * @prefix_len: Prefix length + * @flags: MIGRATE_ADDR_* flags (migration format) + */ +struct migrate_addr_v3 { + struct in6_addr addr; + uint32_t prefix_len; + uint32_t flags;
Since this is 32-bit...
+} __attribute__((__packed__)); + +/** + * flags_to_migration() - Convert internal flags to stable migration format + * @flags: Internal CONF_ADDR_* flags + * + * Return: Migration format MIGRATE_ADDR_* flags + */ +static uint8_t flags_to_migration(uint8_t flags)
... this should probably also return 32-bit.
+{ + uint8_t migration = 0; + + if (flags & CONF_ADDR_USER) + migration |= MIGRATE_ADDR_USER; + if (flags & CONF_ADDR_HOST) + migration |= MIGRATE_ADDR_HOST; + if (flags & CONF_ADDR_LINKLOCAL) + migration |= MIGRATE_ADDR_LINKLOCAL; + if (flags & CONF_ADDR_OBSERVED) + migration |= MIGRATE_ADDR_OBSERVED; + + return migration;
That way you could also include the htonl() / ntohl() as part of the conversion functions.
+} + +/** + * flags_from_migration() - Convert migration format flags to internal format + * @migration: Migration format MIGRATE_ADDR_* flags + * + * Return: Internal CONF_ADDR_* flags + */ +static uint8_t flags_from_migration(uint8_t migration)
Same comments here, but in reverse.
+{ + uint8_t flags = 0; + + if (migration & MIGRATE_ADDR_USER) + flags |= CONF_ADDR_USER; + if (migration & MIGRATE_ADDR_HOST) + flags |= CONF_ADDR_HOST; + if (migration & MIGRATE_ADDR_LINKLOCAL) + flags |= CONF_ADDR_LINKLOCAL; + if (migration & MIGRATE_ADDR_OBSERVED) + flags |= CONF_ADDR_OBSERVED; + + return flags; +} + /** * seen_addrs_source_v2() - Copy and send guest observed addresses from source * @c: Execution context @@ -126,6 +191,99 @@ static int seen_addrs_target_v2(struct ctx *c, return 0; }
+/** + * addrs_source_v3() - Send all addresses with flags from source + * @c: Execution context + * @stage: Migration stage, unused + * @fd: File descriptor for state transfer + * + * Send all address entries using a stable migration format. Each field is + * serialised explicitly to avoid coupling the migration format to internal + * structure layout or flag bit assignments. + * + * Return: 0 on success, positive error code on failure + */ +/* cppcheck-suppress [constParameterCallback, unmatchedSuppression] */ +static int addrs_source_v3(struct ctx *c, + const struct migrate_stage *stage, int fd) +{ + uint8_t addr_count = c->addr_count; + const struct guest_addr *a; + + (void)stage; + + /* Send count first */ + if (write_all_buf(fd, &addr_count, sizeof(addr_count))) + return errno;
I'd be inclined to use 32-bits here, rather than 8. Yes, 255 is probably more than enough addresses, but this gains future-proofness and consistency with other counts we send at negligible cost. Either way you can use write_u8() or write_u32() from serialise.c here.
+ + /* Send each address in stable migration format */ + for_each_addr(a, c->addrs, c->addr_count, 0) { + struct migrate_addr_v3 migration = { + .addr = a->addr.a6, + .prefix_len = htonl(a->prefix_len), + .flags = htonl(flags_to_migration(a->flags)), + }; + + if (write_all_buf(fd, &migration, sizeof(migration))) + return errno; + } + + /* Send MAC address */ + if (write_all_buf(fd, c->guest_mac, ETH_ALEN)) + return errno; + + return 0; +} + +/** + * addrs_target_v3() - Receive addresses on target + * @c: Execution context + * @stage: Migration stage, unused + * @fd: File descriptor for state transfer + * + * Receive address entries from the stable migration format and merge only + * observed addresses into local array. Source sends all addresses for + * forward compatibility, but target only applies those marked as observed. + * + * Return: 0 on success, positive error code on failure + */ +static int addrs_target_v3(struct ctx *c, + const struct migrate_stage *stage, int fd) +{ + uint8_t addr_count, i; + + (void)stage; + + if (read_all_buf(fd, &addr_count, sizeof(addr_count))) + return errno;
read_u8() / read_u32() from serialise.c.
+ + if (addr_count > MAX_GUEST_ADDRS)
Print a warning?
+ addr_count = MAX_GUEST_ADDRS; + + /* Read each address from stable migration format */ + for (i = 0; i < addr_count; i++) { + struct migrate_addr_v3 migration; + struct guest_addr addr; + + if (read_all_buf(fd, &migration, sizeof(migration))) + return errno; + + addr.addr.a6 = migration.addr; + addr.prefix_len = ntohl(migration.prefix_len); + addr.flags = flags_from_migration(ntohl(migration.flags)); + + if (addr.flags & CONF_ADDR_OBSERVED) { + fwd_set_addr(c, &addr.addr, addr.flags, + addr.prefix_len); + }
I'm assuming that sending all the addresses, but only importing the OBSERVED ones is to allow us to change the policy about which addresses are migrated in future. That seems wise.
+ } + + if (read_all_buf(fd, c->guest_mac, ETH_ALEN)) + return errno; + + return 0; +} + /* Stages for version 2 */ static const struct migrate_stage stages_v2[] = { { @@ -146,8 +304,29 @@ static const struct migrate_stage stages_v2[] = { { 0 }, };
+/* Stages for version 3 (all addresses, with flags) */ +static const struct migrate_stage stages_v3[] = { + { + .name = "addresses", + .source = addrs_source_v3, + .target = addrs_target_v3, + }, + { + .name = "prepare flows", + .source = flow_migrate_source_pre, + .target = NULL, + }, + { + .name = "transfer flows", + .source = flow_migrate_source, + .target = flow_migrate_target, + }, + { 0 }, +}; + /* Supported encoding versions, from latest (most preferred) to oldest */ static const struct migrate_version versions[] = { + { 3, stages_v3, }, { 2, stages_v2, }, /* v1 was released, but not widely used. It had bad endianness for the * MSS and omitted timestamps, which meant it usually wouldn't work. -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:17PM -0400, Jon Maloy wrote:
We introduce a CONF_ADDR_DHCP flag to mark if an added address is eligible for DHCP advertisement. By doing this once and for all in the fwd_set_addr() function, the DHCP code only needs to check for this flag to know that all criteria for advertisement are fulfilled. Hence, we update the code in dhcp.c correspondingly.
We also let the conf_print() function use this flag to determine and print the selected address.
Signed-off-by: Jon Maloy
--- v6: -Split off from a commit handling both DHCP and DHCPv6
v7: -Modified DHCP advertisement eligibility criteria IPv4 addresses: We now permit link local addresses to be eligible if they were configured by the user. -Adapted to previous changes in this series --- conf.c | 5 +++-- dhcp.c | 14 +++++++++----- fwd.c | 8 ++++++++ migrate.c | 5 +++++ passt.h | 1 + 5 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/conf.c b/conf.c index 3cb3553..612df07 100644 --- a/conf.c +++ b/conf.c @@ -46,6 +46,7 @@ #include "lineread.h" #include "isolation.h" #include "log.h" +#include "fwd.h" #include "vhost_user.h"
#define NETNS_RUN_DIR "/run/netns" @@ -1181,8 +1182,8 @@ static void conf_print(const struct ctx *c) inet_ntop(AF_INET, &c->ip4.map_host_loopback, buf, sizeof(buf)));
- a = fwd_get_addr(c, AF_INET, 0, 0); - if (a && !c->no_dhcp) { + a = fwd_get_addr(c, AF_INET, CONF_ADDR_DHCP, 0); + if (a) { uint32_t mask;
mask = IN4_MASK(inany_prefix_len(&a->addr, diff --git a/dhcp.c b/dhcp.c index f0fa212..0f98cfc 100644 --- a/dhcp.c +++ b/dhcp.c @@ -31,6 +31,8 @@ #include "passt.h" #include "tap.h" #include "log.h" +#include "fwd.h" +#include "conf.h" #include "dhcp.h"
/** @@ -302,19 +304,18 @@ static void opt_set_dns_search(const struct ctx *c, size_t max_len) */ int dhcp(const struct ctx *c, struct iov_tail *data) { + struct in_addr addr, mask, dst; char macstr[ETH_ADDRSTRLEN]; const struct guest_addr *a; size_t mlen, dlen, opt_len; - struct in_addr mask, dst; struct ethhdr eh_storage; struct iphdr iph_storage; struct udphdr uh_storage; + const struct udphdr *uh; const struct ethhdr *eh; const struct iphdr *iph; - const struct udphdr *uh; struct msg m_storage; struct msg const *m; - struct in_addr addr; struct msg reply; unsigned int i;
@@ -346,8 +347,11 @@ int dhcp(const struct ctx *c, struct iov_tail *data) m->op != BOOTREQUEST) return -1;
- a = fwd_get_addr(c, AF_INET, 0, 0); - assert(a); + /* Select address to offer */ + a = fwd_get_addr(c, AF_INET, CONF_ADDR_DHCP, 0); + if (!a) + return -1; + addr = *inany_v4(&a->addr);
reply.op = BOOTREPLY; diff --git a/fwd.c b/fwd.c index b177be9..39e52c4 100644 --- a/fwd.c +++ b/fwd.c @@ -293,6 +293,14 @@ void fwd_set_addr(struct ctx *c, const union inany_addr *addr, return; }
+ /* Determine advertisement eligibility */ + if (inany_v4(addr)) { + if ((flags & CONF_ADDR_USER) || + (flags & CONF_ADDR_HOST && !(flags & CONF_ADDR_LINKLOCAL))) + if (!c->no_dhcp)
Nit: the !no_dhcp test can become part of the outermost if.
+ flags |= CONF_ADDR_DHCP; + } + /* Add to head or tail, depending on flag */ if (flags & CONF_ADDR_OBSERVED) { a = &arr[0]; diff --git a/migrate.c b/migrate.c index 93f67ae..afdc8b4 100644 --- a/migrate.c +++ b/migrate.c @@ -52,6 +52,7 @@ struct migrate_seen_addrs_v2 { #define MIGRATE_ADDR_HOST BIT(1) #define MIGRATE_ADDR_LINKLOCAL BIT(2) #define MIGRATE_ADDR_OBSERVED BIT(3) +#define MIGRATE_ADDR_DHCP BIT(4)
/** * struct migrate_addr_v3 - Migration format for a single address entry @@ -83,6 +84,8 @@ static uint8_t flags_to_migration(uint8_t flags) migration |= MIGRATE_ADDR_LINKLOCAL; if (flags & CONF_ADDR_OBSERVED) migration |= MIGRATE_ADDR_OBSERVED; + if (flags & CONF_ADDR_DHCP) + migration |= MIGRATE_ADDR_DHCP;
return migration; } @@ -105,6 +108,8 @@ static uint8_t flags_from_migration(uint8_t migration) flags |= CONF_ADDR_LINKLOCAL; if (migration & MIGRATE_ADDR_OBSERVED) flags |= CONF_ADDR_OBSERVED; + if (migration & MIGRATE_ADDR_DHCP) + flags |= CONF_ADDR_DHCP;
This is a change to the migration protocol, although not exactly a breaking change to it. You can avoid the question my moving this patch before the one that introduces v3 migration.
return flags; } diff --git a/passt.h b/passt.h index 3ef84eb..9508c2a 100644 --- a/passt.h +++ b/passt.h @@ -83,6 +83,7 @@ struct guest_addr { #define CONF_ADDR_GENERATED BIT(2) /* Generated by PASST/PASTA */ #define CONF_ADDR_LINKLOCAL BIT(3) /* Link-local address */ #define CONF_ADDR_OBSERVED BIT(4) /* Seen in guest traffic */ +#define CONF_ADDR_DHCP BIT(5) /* Advertise via DHCP (IPv4) */ };
/** -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:18PM -0400, Jon Maloy wrote:
We introduce a CONF_ADDR_DHCPV6 flag to mark if an added address is eligible for DHCP advertisement. By doing this once and for all
s/DHCP/DHCPv6/
in the fwd_set_addr() function, the DHCPv6 code only needs to check for this flag to know that all criteria for advertisement are fulfilled.
We update the code in dhcpv6.c both to use the new flag and to make it possible to send multiple addresses in a single reply message, per RFC 8415.
We also let the conf_print() function use this flag to identify and print the eligible addresses.
Signed-off-by: Jon Maloy
--- v6: -Refactored the DHCPv6 response structure to use a variable-length buffer for IA_ADDR options, hopefully making this part of the code slightly clearer.
v7: -Adapted to previous changes in this series -Some minor changes based on feedback --- conf.c | 35 ++++++++++++++++---- dhcpv6.c | 96 ++++++++++++++++++++++++++++++++----------------------- fwd.c | 3 ++ migrate.c | 5 +++ passt.h | 1 + 5 files changed, 93 insertions(+), 47 deletions(-)
diff --git a/conf.c b/conf.c index 612df07..7c705de 100644 --- a/conf.c +++ b/conf.c @@ -1216,21 +1216,42 @@ static void conf_print(const struct ctx *c) }
if (c->ifi6) { + bool has_dhcpv6 = false; + const char *head; + if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback)) info(" NAT to host ::1: %s", inet_ntop(AF_INET6, &c->ip6.map_host_loopback, buf, sizeof(buf)));
- if (!c->no_ndp && !c->no_dhcpv6) - info("NDP/DHCPv6:"); - else if (!c->no_dhcpv6) - info("DHCPv6:"); - else if (!c->no_ndp) - info("NDP:"); - else + for_each_addr(a, c->addrs, c->addr_count, AF_INET6) { + if (a->flags & CONF_ADDR_DHCPV6) + has_dhcpv6 = true; + } + + if (c->no_ndp && !has_dhcpv6) goto dns6;
a = fwd_get_addr(c, AF_INET6, 0, CONF_ADDR_LINKLOCAL); + if (!c->no_ndp && a) { + info("NDP:"); + info(" assign: %s", + inany_ntop(&a->addr, buf, sizeof(buf))); + } + + if (has_dhcpv6) { + info("DHCPv6:"); + head = "assign: "; + for_each_addr(a, c->addrs, c->addr_count, AF_INET6) { + if (!(a->flags & CONF_ADDR_DHCPV6))
Nit: the check against CONF_ADDR_DHCPV6 is redundant with the check against AF_INET6 built into the loop macro.
+ continue; + info(" %s: %s/%d", head, + inany_ntop(&a->addr, buf, sizeof(buf)), + a->prefix_len); + head = " "; + } + } + if (a) info(" assign: %s", inany_ntop(&a->addr, buf, sizeof(buf))); diff --git a/dhcpv6.c b/dhcpv6.c index 447aaba..546a3ea 100644 --- a/dhcpv6.c +++ b/dhcpv6.c @@ -31,6 +31,8 @@ #include "passt.h" #include "tap.h" #include "log.h" +#include "fwd.h" +#include "conf.h"
/** * struct opt_hdr - DHCPv6 option header @@ -202,56 +204,35 @@ struct msg_hdr { uint32_t xid:24; } __attribute__((__packed__));
+/* Maximum variable part size: ia_addrs + client_id + dns + search + fqdn */ +#define RESP_VAR_MAX (MAX_GUEST_ADDRS * sizeof(struct opt_ia_addr) + \ + sizeof(struct opt_client_id) + \ + sizeof(struct opt_dns_servers) + \ + sizeof(struct opt_dns_search) + \ + sizeof(struct opt_client_fqdn)) + /** * struct resp_t - Normal advertise and reply message * @hdr: DHCP message header * @server_id: Server Identifier option * @ia_na: Non-temporary Address option - * @ia_addr: Address for IA_NA - * @client_id: Client Identifier, variable length - * @dns_servers: DNS Recursive Name Server, here just for storage size - * @dns_search: Domain Search List, here just for storage size - * @client_fqdn: Client FQDN, variable length + * @var: Variable part: IA_ADDRs, client_id, dns, search, fqdn */ static struct resp_t { struct msg_hdr hdr;
struct opt_server_id server_id; struct opt_ia_na ia_na; - struct opt_ia_addr ia_addr; - struct opt_client_id client_id; - struct opt_dns_servers dns_servers; - struct opt_dns_search dns_search; - struct opt_client_fqdn client_fqdn; + uint8_t var[RESP_VAR_MAX]; } __attribute__((__packed__)) resp = { { 0 }, SERVER_ID,
- { { OPT_IA_NA, OPT_SIZE_CONV(sizeof(struct opt_ia_na) + - sizeof(struct opt_ia_addr) - - sizeof(struct opt_hdr)) }, + { { OPT_IA_NA, 0 }, /* Length set dynamically */ 1, (uint32_t)~0U, (uint32_t)~0U },
- { { OPT_IAAADR, OPT_SIZE(ia_addr) }, - IN6ADDR_ANY_INIT, (uint32_t)~0U, (uint32_t)~0U - }, - - { { OPT_CLIENTID, 0, }, - { 0 } - }, - - { { OPT_DNS_SERVERS, 0, }, - { IN6ADDR_ANY_INIT } - }, - - { { OPT_DNS_SEARCH, 0, }, - { 0 }, - }, - - { { OPT_CLIENT_FQDN, 0, }, - 0, { 0 }, - }, + { 0 }, /* Variable part filled dynamically */ };
static const struct opt_status_code sc_not_on_link = { @@ -540,6 +521,42 @@ static size_t dhcpv6_client_fqdn_fill(const struct iov_tail *data, return offset + sizeof(struct opt_hdr) + opt_len; }
+/** + * dhcpv6_ia_addr_fill() - Fill IA_ADDR options for all suitable addresses + * @c: Execution context + * + * Fills IA_ADDRs in resp.var with all non-linklocal host or user-provided + * addresses and updates resp.ia_na.hdr.l with the correct length. + * + * Return: number of addresses filled + */ +static int dhcpv6_ia_addr_fill(const struct ctx *c) +{ + struct opt_ia_addr *ia_addr = (struct opt_ia_addr *)resp.var; + const struct guest_addr *e; + int count = 0; + + for_each_addr(e, c->addrs, c->addr_count, AF_INET6) { + if (!(e->flags & CONF_ADDR_DHCPV6)) + continue; + + ia_addr[count].hdr.t = OPT_IAAADR; + ia_addr[count].hdr.l = htons(sizeof(struct opt_ia_addr) - + sizeof(struct opt_hdr)); + ia_addr[count].addr = e->addr.a6; + ia_addr[count].pref_lifetime = (uint32_t)~0U; + ia_addr[count].valid_lifetime = (uint32_t)~0U; + count++; + } + + /* Update IA_NA length: header fields + all IA_ADDRs */ + resp.ia_na.hdr.l = htons(sizeof(struct opt_ia_na) - + sizeof(struct opt_hdr) + + count * sizeof(struct opt_ia_addr)); + + return count; +} + /** * dhcpv6() - Check if this is a DHCPv6 message, reply as needed * @c: Execution context @@ -573,6 +590,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, const struct msg_hdr *mh; struct udphdr uh_storage; const struct udphdr *uh; + int addr_count; size_t mlen, n;
a = fwd_get_addr(c, AF_INET6, 0, CONF_ADDR_LINKLOCAL); @@ -618,6 +636,7 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (ia && ntohs(ia->hdr.l) < MIN(OPT_VSIZE(ia_na), OPT_VSIZE(ia_ta))) return -1;
+ addr_count = dhcpv6_ia_addr_fill(c); resp.hdr.type = TYPE_REPLY; switch (mh->type) { case TYPE_REQUEST: @@ -671,12 +690,14 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, if (ia) resp.ia_na.iaid = ((struct opt_ia_na *)ia)->iaid;
+ /* Client_id goes right after the used IA_ADDRs */ + n = offsetof(struct resp_t, var) + + addr_count * sizeof(struct opt_ia_addr); iov_to_buf(&client_id_base.iov[0], client_id_base.cnt, - client_id_base.off, &resp.client_id, + client_id_base.off, (char *)&resp + n, ntohs(client_id->l) + sizeof(struct opt_hdr));
- n = offsetof(struct resp_t, client_id) + - sizeof(struct opt_hdr) + ntohs(client_id->l); + n += sizeof(struct opt_hdr) + ntohs(client_id->l); n = dhcpv6_dns_fill(c, (char *)&resp, n); n = dhcpv6_client_fqdn_fill(data, c, (char *)&resp, n);
@@ -693,7 +714,6 @@ int dhcpv6(struct ctx *c, struct iov_tail *data, */ void dhcpv6_init(const struct ctx *c) { - const struct guest_addr *a; time_t y2k = 946684800; /* Epoch to 2000-01-01T00:00:00Z, no mktime() */ uint32_t duid_time;
@@ -706,8 +726,4 @@ void dhcpv6_init(const struct ctx *c) c->our_tap_mac, sizeof(c->our_tap_mac)); memcpy(resp_not_on_link.server_id.duid_lladdr, c->our_tap_mac, sizeof(c->our_tap_mac)); - - a = fwd_get_addr(c, AF_INET6, 0, CONF_ADDR_LINKLOCAL); - if (a) - resp.ia_addr.addr = a->addr.a6; } diff --git a/fwd.c b/fwd.c index 39e52c4..2b444fb 100644 --- a/fwd.c +++ b/fwd.c @@ -299,6 +299,9 @@ void fwd_set_addr(struct ctx *c, const union inany_addr *addr, (flags & CONF_ADDR_HOST && !(flags & CONF_ADDR_LINKLOCAL))) if (!c->no_dhcp) flags |= CONF_ADDR_DHCP; + } else if (!(flags & CONF_ADDR_LINKLOCAL)) { + if (!c->no_dhcpv6) + flags |= CONF_ADDR_DHCPV6; }
/* Add to head or tail, depending on flag */ diff --git a/migrate.c b/migrate.c index afdc8b4..adcbc63 100644 --- a/migrate.c +++ b/migrate.c @@ -53,6 +53,7 @@ struct migrate_seen_addrs_v2 { #define MIGRATE_ADDR_LINKLOCAL BIT(2) #define MIGRATE_ADDR_OBSERVED BIT(3) #define MIGRATE_ADDR_DHCP BIT(4) +#define MIGRATE_ADDR_DHCPV6 BIT(5)
Same comment as for previous patch.
/** * struct migrate_addr_v3 - Migration format for a single address entry @@ -86,6 +87,8 @@ static uint8_t flags_to_migration(uint8_t flags) migration |= MIGRATE_ADDR_OBSERVED; if (flags & CONF_ADDR_DHCP) migration |= MIGRATE_ADDR_DHCP; + if (flags & CONF_ADDR_DHCPV6) + migration |= MIGRATE_ADDR_DHCPV6;
return migration; } @@ -110,6 +113,8 @@ static uint8_t flags_from_migration(uint8_t migration) flags |= CONF_ADDR_OBSERVED; if (migration & MIGRATE_ADDR_DHCP) flags |= CONF_ADDR_DHCP; + if (migration & MIGRATE_ADDR_DHCPV6) + flags |= CONF_ADDR_DHCPV6;
return flags; } diff --git a/passt.h b/passt.h index 9508c2a..028eb7c 100644 --- a/passt.h +++ b/passt.h @@ -84,6 +84,7 @@ struct guest_addr { #define CONF_ADDR_LINKLOCAL BIT(3) /* Link-local address */ #define CONF_ADDR_OBSERVED BIT(4) /* Seen in guest traffic */ #define CONF_ADDR_DHCP BIT(5) /* Advertise via DHCP (IPv4) */ +#define CONF_ADDR_DHCPV6 BIT(6) /* Advertise via DHCPv6 (IPv6) */ };
/** -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Sun, Apr 12, 2026 at 08:53:19PM -0400, Jon Maloy wrote:
We extend NDP to advertise all suitable IPv6 prefixes in Router Advertisements, per RFC 4861. Observed and link-local addresses, plus addresses with a prefix length != 64, are excluded.
Signed-off-by: Jon Maloy
--- v6: -Adapted to previous changes in series
v7: -Adapted to previous changes in series -Use struct initializer for source link-layer address option -Other minor fixes based on feedback from Stefano --- conf.c | 19 ++++++--- fwd.c | 4 ++ migrate.c | 5 +++ ndp.c | 123 +++++++++++++++++++++++++++++++++++++----------------- passt.h | 3 +- 5 files changed, 108 insertions(+), 46 deletions(-)
diff --git a/conf.c b/conf.c index 7c705de..97c66c9 100644 --- a/conf.c +++ b/conf.c @@ -1216,7 +1216,7 @@ static void conf_print(const struct ctx *c) }
if (c->ifi6) { - bool has_dhcpv6 = false; + bool has_ndp = false, has_dhcpv6 = false;
Nit: has_slaac might be a better name, since NDP has a number of functions.
const char *head;
if (!IN6_IS_ADDR_UNSPECIFIED(&c->ip6.map_host_loopback)) @@ -1225,18 +1225,25 @@ static void conf_print(const struct ctx *c) buf, sizeof(buf)));
for_each_addr(a, c->addrs, c->addr_count, AF_INET6) { + if (a->flags & CONF_ADDR_SLAAC) + has_ndp = true; if (a->flags & CONF_ADDR_DHCPV6) has_dhcpv6 = true; }
- if (c->no_ndp && !has_dhcpv6) + if (!has_ndp && !has_dhcpv6) goto dns6;
- a = fwd_get_addr(c, AF_INET6, 0, CONF_ADDR_LINKLOCAL); - if (!c->no_ndp && a) { + if (has_ndp) { info("NDP:"); - info(" assign: %s", - inany_ntop(&a->addr, buf, sizeof(buf))); + head = "assign: "; + for_each_addr(a, c->addrs, c->addr_count, AF_INET6) {
Nit: AF_INET6 is redundant with the CONF_ADDR_SLAAC check.
+ if (!(a->flags & CONF_ADDR_SLAAC)) + continue; + inany_ntop(&a->addr, buf, sizeof(buf)); + info(" %s: %s/%d", head, buf, a->prefix_len); + head = " "; + } }
if (has_dhcpv6) { diff --git a/fwd.c b/fwd.c index 2b444fb..2bc8e33 100644 --- a/fwd.c +++ b/fwd.c @@ -302,6 +302,10 @@ void fwd_set_addr(struct ctx *c, const union inany_addr *addr, } else if (!(flags & CONF_ADDR_LINKLOCAL)) { if (!c->no_dhcpv6) flags |= CONF_ADDR_DHCPV6; + + /* NDP/RA only if prefix is /64 */ + if (!c->no_ndp && prefix_len == 64) + flags |= CONF_ADDR_SLAAC; }
/* Add to head or tail, depending on flag */ diff --git a/migrate.c b/migrate.c index adcbc63..f019924 100644 --- a/migrate.c +++ b/migrate.c @@ -54,6 +54,7 @@ struct migrate_seen_addrs_v2 { #define MIGRATE_ADDR_OBSERVED BIT(3) #define MIGRATE_ADDR_DHCP BIT(4) #define MIGRATE_ADDR_DHCPV6 BIT(5) +#define MIGRATE_ADDR_SLAAC BIT(6)
Same comments again about migration protocol.
/** * struct migrate_addr_v3 - Migration format for a single address entry @@ -89,6 +90,8 @@ static uint8_t flags_to_migration(uint8_t flags) migration |= MIGRATE_ADDR_DHCP; if (flags & CONF_ADDR_DHCPV6) migration |= MIGRATE_ADDR_DHCPV6; + if (flags & CONF_ADDR_SLAAC) + migration |= MIGRATE_ADDR_SLAAC;
return migration; } @@ -115,6 +118,8 @@ static uint8_t flags_from_migration(uint8_t migration) flags |= CONF_ADDR_DHCP; if (migration & MIGRATE_ADDR_DHCPV6) flags |= CONF_ADDR_DHCPV6; + if (migration & MIGRATE_ADDR_SLAAC) + flags |= CONF_ADDR_SLAAC;
return flags; } diff --git a/ndp.c b/ndp.c index 3750fc5..bb8374a 100644 --- a/ndp.c +++ b/ndp.c @@ -32,6 +32,8 @@ #include "passt.h" #include "tap.h" #include "log.h" +#include "fwd.h" +#include "conf.h"
#define RT_LIFETIME 65535
@@ -99,6 +101,16 @@ struct opt_prefix_info { uint32_t reserved; } __attribute__((packed));
+/** + * struct ndp_prefix - Prefix Information option with prefix + * @info: Prefix Information option header + * @prefix: IPv6 prefix + */ +struct ndp_prefix { + struct opt_prefix_info info; + struct in6_addr prefix; +} __attribute__((__packed__)); + /** * struct opt_mtu - Maximum transmission unit (MTU) option * @header: Option header @@ -140,27 +152,23 @@ struct opt_dnssl { } __attribute__((packed));
/** - * struct ndp_ra - NDP Router Advertisement (RA) message + * struct ndp_ra_hdr - NDP Router Advertisement fixed header * @ih: ICMPv6 header * @reachable: Reachability time, after confirmation (ms) * @retrans: Time between retransmitted NS messages (ms) - * @prefix_info: Prefix Information option - * @prefix: IPv6 prefix - * @mtu: MTU option - * @source_ll: Target link-layer address - * @var: Variable fields */ -struct ndp_ra { +struct ndp_ra_hdr { struct icmp6hdr ih; uint32_t reachable; uint32_t retrans; - struct opt_prefix_info prefix_info; - struct in6_addr prefix; - struct opt_l2_addr source_ll; +} __attribute__((__packed__));
- unsigned char var[sizeof(struct opt_mtu) + sizeof(struct opt_rdnss) + - sizeof(struct opt_dnssl)]; -} __attribute__((packed, aligned(__alignof__(struct in6_addr)))); +/* Maximum RA message size: hdr + prefixes + source_ll + mtu + rdnss + dnssl */ +#define NDP_RA_MAX_SIZE (sizeof(struct ndp_ra_hdr) + \ + MAX_GUEST_ADDRS * sizeof(struct ndp_prefix) + \ + sizeof(struct opt_l2_addr) + \ + sizeof(struct opt_mtu) + sizeof(struct opt_rdnss) + \ + sizeof(struct opt_dnssl))
/** * struct ndp_ns - NDP Neighbor Solicitation (NS) message @@ -231,6 +239,42 @@ void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr) ndp_na(c, &in6addr_ll_all_nodes, addr); }
+/** + * ndp_prefix_fill() - Fill prefix options for all suitable addresses + * @c: Execution context + * @buf: Buffer to write prefix options into + * + * Fills buffer with Prefix Information options for all non-linklocal, + * non-observed addresses with prefix_len == 64 + * + * Return: number of bytes written + */ +static size_t ndp_prefix_fill(const struct ctx *c, unsigned char *buf) +{ + const struct guest_addr *a; + struct ndp_prefix *p; + size_t offset = 0; + + for_each_addr(a, c->addrs, c->addr_count, AF_INET6) { + if (!(a->flags & CONF_ADDR_SLAAC)) + continue; + + p = (struct ndp_prefix *)(buf + offset); + p->info.header.type = OPT_PREFIX_INFO; + p->info.header.len = 4; /* 4 * 8 = 32 bytes */ + p->info.prefix_len = 64; + p->info.prefix_flags = 0xc0; /* L, A flags */ + p->info.valid_lifetime = ~0U; + p->info.pref_lifetime = ~0U; + p->info.reserved = 0; + p->prefix = a->addr.a6; + + offset += sizeof(struct ndp_prefix); + } + + return offset; +} + /** * ndp_ra() - Send an NDP Router Advertisement (RA) message * @c: Execution context @@ -238,7 +282,15 @@ void ndp_unsolicited_na(const struct ctx *c, const struct in6_addr *addr) */ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) { - struct ndp_ra ra = { + unsigned char buf[NDP_RA_MAX_SIZE] + __attribute__((__aligned__(__alignof__(struct in6_addr)))); + struct ndp_ra_hdr *hdr = (struct ndp_ra_hdr *)buf; + struct opt_l2_addr *source_ll; + unsigned char *ptr; + size_t prefix_len; + + /* Build RA header */ + *hdr = (struct ndp_ra_hdr){ .ih = { .icmp6_type = RA, .icmp6_code = 0, @@ -247,31 +299,26 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) .icmp6_rt_lifetime = htons_constant(RT_LIFETIME), .icmp6_addrconf_managed = 1, }, - .prefix_info = { - .header = { - .type = OPT_PREFIX_INFO, - .len = 4, - }, - .prefix_len = 64, - .prefix_flags = 0xc0, /* prefix flags: L, A */ - .valid_lifetime = ~0U, - .pref_lifetime = ~0U, - }, - .source_ll = { - .header = { - .type = OPT_SRC_L2_ADDR, - .len = 1, - }, - }, }; - const struct guest_addr *a = fwd_get_addr(c, AF_INET6, 0, 0); - unsigned char *ptr = NULL;
- ASSERT(a); - - ra.prefix = a->addr.a6; + /* Fill prefix options */ + prefix_len = ndp_prefix_fill(c, (unsigned char *)(hdr + 1)); + if (prefix_len == 0) { + /* No suitable prefixes to advertise */ + return;
Do we want this? I think it's still valid to present other options via NDP, even if we have no prefixes to advertise. And, I think we want to do so for at least the MTU.
+ }
- ptr = &ra.var[0]; + /* Add source link-layer address option */ + ptr = (unsigned char *)(hdr + 1) + prefix_len; + source_ll = (struct opt_l2_addr *)ptr; + *source_ll = (struct opt_l2_addr) { + .header = { + .type = OPT_SRC_L2_ADDR, + .len = 1, + }, + }; + memcpy(source_ll->mac, c->our_tap_mac, ETH_ALEN); + ptr += sizeof(struct opt_l2_addr);
if (c->mtu) { struct opt_mtu *mtu = (struct opt_mtu *)ptr; @@ -345,10 +392,8 @@ static void ndp_ra(const struct ctx *c, const struct in6_addr *dst) } }
- memcpy(&ra.source_ll.mac, c->our_tap_mac, ETH_ALEN); - /* NOLINTNEXTLINE(clang-analyzer-security.PointerSub) */ - ndp_send(c, dst, &ra, ptr - (unsigned char *)&ra); + ndp_send(c, dst, buf, ptr - buf); }
/** diff --git a/passt.h b/passt.h index 028eb7c..2c633e4 100644 --- a/passt.h +++ b/passt.h @@ -84,7 +84,8 @@ struct guest_addr { #define CONF_ADDR_LINKLOCAL BIT(3) /* Link-local address */ #define CONF_ADDR_OBSERVED BIT(4) /* Seen in guest traffic */ #define CONF_ADDR_DHCP BIT(5) /* Advertise via DHCP (IPv4) */ -#define CONF_ADDR_DHCPV6 BIT(6) /* Advertise via DHCPv6 (IPv6) */ +#define CONF_ADDR_DHCPV6 BIT(6) /* Advertise via DHCPv6 */
This change belongs in the previous patch.
+#define CONF_ADDR_SLAAC BIT(7) /* Advertise via NDP/RA (/64) */ };
/** -- 2.52.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
participants (2)
-
David Gibson
-
Jon Maloy