[PATCH v7 0/8] Use true MAC address of LAN local remote hosts
Bug #120 asks us to use the true MAC addresses of LAN local remote hosts, since some programs need this information. These commits introduces this for ARP, NDP, UDP, TCP and ICMP. --- v3: Updated according to feedback from Stefano and David: - Made the ARP/NDP lookup call filter out the requested address by itself, qualified by the index if the template interface - Moved the flow specific MAC address from struct flowside to struct flow_common. v4: - Updated according to feedback from David and Stefan - Added a cache table for ARP/NDP table contents v5: - Updated according to feedback from David and Stefan - Added cache table entries to FIFO/LRU queue - New criteria for when to consult ARP/NDP v6: - Simplified and merged mac cache table commits - Other changes after feedback from David. v7: - Fixes in patch #2 based on feedback from David and Stefano. Jon Maloy (8): netlink: add function to extract MAC addresses from NDP/ARP table fwd: Added cache table for ARP/NDP contents arp/ndp: respond with true MAC address of LAN local remote hosts flow: add MAC address of LAN local remote hosts to flow udp: forward external source MAC address through tap interface tcp: forward external source MAC address through tap interface tap: change signature of function tap_push_l2h() icmp: let icmp use mac address from flowside structure arp.c | 9 ++- conf.c | 1 + flow.c | 2 + flow.h | 2 + fwd.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++-- fwd.h | 6 ++ icmp.c | 8 ++- inany.c | 1 + ndp.c | 10 ++- netlink.c | 82 ++++++++++++++++++++++ netlink.h | 2 + passt.c | 9 ++- passt.h | 3 +- pasta.c | 2 +- tap.c | 24 ++++--- tap.h | 7 +- tcp.c | 18 +++-- tcp.h | 2 +- tcp_buf.c | 37 +++++----- tcp_internal.h | 4 +- tcp_vu.c | 5 +- udp.c | 57 +++++++++------ udp.h | 2 +- util.c | 12 ++++ util.h | 1 + 25 files changed, 407 insertions(+), 83 deletions(-) -- 2.50.1
The solution to bug https://bugs.passt.top/show_bug.cgi?id=120
requires the ability to translate from an IP address to its
corresponding MAC address in cases where those are present in
the ARP/NDP table.
We add this feature here.
Signed-off-by: Jon Maloy
We add a cache table to keep partial contents of the kernel ARP/NDP
tables. This way, we drastically reduce the number of netlink calls
to read those tables.
We create undefined cache entries representing non- or not-yet-
existing ARP/NDP entries when needed. We add a short expiration time
to each such entry, so that we can know when to make repeated calls to
the kernel tables in the beginning. We also add an access counter to the
entries, to ensure that the timer becomes longer and the call frequency
abates over time if no ARP/NDP entry shows up.
For regular entries we use a much longer timer, with the purpose to
update the entry in the rare case that a remote host changes its
MAC address.
Signed-off-by: Jon Maloy
On Wed, Sep 10, 2025 at 11:03:49AM -0400, Jon Maloy wrote:
We add a cache table to keep partial contents of the kernel ARP/NDP tables. This way, we drastically reduce the number of netlink calls to read those tables.
We create undefined cache entries representing non- or not-yet- existing ARP/NDP entries when needed. We add a short expiration time to each such entry, so that we can know when to make repeated calls to the kernel tables in the beginning. We also add an access counter to the entries, to ensure that the timer becomes longer and the call frequency abates over time if no ARP/NDP entry shows up.
For regular entries we use a much longer timer, with the purpose to update the entry in the rare case that a remote host changes its MAC address.
Signed-off-by: Jon Maloy
--- v5: - Moved to earlier in series to reduce rebase conflicts v6: - Sqashed the hash list commit and the FIFO/LRU queue commit - Removed hash lookup. We now only use linear lookup in a linked list - Eliminated dynamic memory allocation. - Ensured there is only one call to clock_gettime() - Using MAC_ZERO instead of the previously dedicated definitions v7: - NOW using MAC_ZERO where needed - I am still using linear back-off for empty cache entries. Even an incoming, flow-creating packet from a local host gives no guarantee that its MAC address is in the ARP table, so we must allow for a few new attempts at first possible occasions. Only after several failed lookups can we conclude that we probably never will succeed. Hence the back-off.
Still not sure I'm entirely convinced, but ok.
Reviewed-by: David Gibson
- Fixed a bug that David inadvertently made me aware of: I only intended to set the initial expiry value to MAC_CACHE_RENEWAL when an ARP/NDP table lookup was successful. - Improved struct and function description comments --- conf.c | 1 + fwd.c | 189 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ fwd.h | 4 ++ util.c | 12 ++++ util.h | 1 + 5 files changed, 207 insertions(+)
diff --git a/conf.c b/conf.c index f47f48e..27b04d3 100644 --- a/conf.c +++ b/conf.c @@ -2122,6 +2122,7 @@ void conf(struct ctx *c, int argc, char **argv) c->udp.fwd_out.mode = fwd_default;
fwd_scan_ports_init(c); + fwd_mac_cache_init();
if (!c->quiet) conf_print(c); diff --git a/fwd.c b/fwd.c index 250cf56..fcd119e 100644 --- a/fwd.c +++ b/fwd.c @@ -19,6 +19,8 @@ #include
#include #include +#include +#include #include "util.h" #include "ip.h" @@ -26,6 +28,8 @@ #include "passt.h" #include "lineread.h" #include "flow_table.h" +#include "inany.h" +#include "netlink.h"
/* Empheral port range: values from RFC 6335 */ static in_port_t fwd_ephemeral_min = (1 << 15) + (1 << 14); @@ -33,6 +37,191 @@ static in_port_t fwd_ephemeral_max = NUM_PORTS - 1;
#define PORT_RANGE_SYSCTL "/proc/sys/net/ipv4/ip_local_port_range"
+#define MAC_CACHE_SIZE 128 +#define MAC_CACHE_RENEWAL 3600 /* Refresh entry from ARP/NDP every hour */ + +/* Partial cache of ARP/NDP table contents */ +/** + * mac_cache_entry - Entry in the ARP/NDP table cache + * @prev: Previous entry in LRU list + * @next: Next entry in LRU list + * @expiry: Point of time after which a new netlink lookup is allowed + * @addr: IP address of represented host + * @mac: MAC address of represented host, if known + * @access_count: Access counter. Used for back-off from netlink calls + */ +struct mac_cache_entry { + struct mac_cache_entry *prev; + struct mac_cache_entry *next; + struct timespec expiry; + union inany_addr addr; + uint8_t mac[ETH_ALEN]; + uint16_t access_count; +}; + +/** + * mac_cache_entry - Partial cache of ARP/NDP table contents + * @head: Least recent used entry in LRU list + * @tail: Most recent used entry in LRU list + * @mac_table: Array of entries, organized as FIFO/LRU list + */ +struct mac_cache_table { + struct mac_cache_entry *head; + struct mac_cache_entry *tail; + struct mac_cache_entry mac_table[MAC_CACHE_SIZE]; +}; + +static struct mac_cache_table mac_cache; + +/** + * fwd_mac_cache_unlink() - Unlink entry from LRU queue + */ +static void fwd_mac_cache_unlink(struct mac_cache_entry *e) +{ + struct mac_cache_table *t = &mac_cache; + + if (e->prev) + e->prev->next = e->next; + else + t->head = e->next; + + if (e->next) + e->next->prev = e->prev; + else + t->tail = e->prev; + + e->prev = e->next = NULL; +} + +/** + * fwd_mac_cache_append_tail() - Add entry to tail of LRU queue + */ +static void fwd_mac_cache_append_to_tail(struct mac_cache_entry *e) +{ + struct mac_cache_table *t = &mac_cache; + + e->next = NULL; + e->prev = t->tail; + t->tail->next = e; + t->tail = e; +} + +/** + * fwd_mac_cache_move_to_tail() - Move entry to tail of LRU queue + */ +static void fwd_mac_cache_move_to_tail(struct mac_cache_entry *e) +{ + struct mac_cache_table *t = &mac_cache; + + if (t->tail == e) + return; + + fwd_mac_cache_unlink(e); + fwd_mac_cache_append_to_tail(e); +} + +/** + * mac_entry_set_expiry() - Set the time for a cache entry to expire + * @now: Current point in time + * @e: Cache entry + * @expiry: Expiration time, in seconds from current moment. + */ +static void mac_entry_set_expiry(struct mac_cache_entry *e, struct timespec *now, int expiry) +{ + e->expiry = *now; + e->expiry.tv_sec += expiry; +} + +/** + * fwd_mac_cache_find() - Find an entry in the ARP/NDP cache table + * @addr: IPv4 or IPv6 address, used as key for the lookup + * + * Return: Pointer to the entry on success, NULL on failure. + */ +static struct mac_cache_entry *fwd_mac_cache_find(const union inany_addr *addr) +{ + const struct mac_cache_table *t = &mac_cache; + struct mac_cache_entry *e = t->tail; + + for (e = t->tail; e; e = e->prev) { + if (inany_equals(&e->addr, addr)) + return e; + } + return NULL; +} + +/** + * fwd_neigh_mac_get() - Lookup MAC address in the real or cached ARP/NDP table + * @c: Execution context + * @addr: IPv4 or IPv6 address, used as lookup key + * @mac: Buffer for Ethernet MAC to return, found or default value. + * + * Return: true if real MAC found, false if not found or if failure + */ +bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, + uint8_t *mac) +{ + struct mac_cache_entry *e = fwd_mac_cache_find(addr); + int ifi = inany_v4(addr) ? c->ifi4 : c->ifi6; + bool refresh = false; + struct timespec now; + bool found = false; + + clock_gettime(CLOCK_MONOTONIC, &now); + + if (e) { + refresh = timespec_before(&e->expiry, &now); + } else { + /* Recycle least recently used entry */ + e = mac_cache.head; + e->addr = *addr; + memcpy(e->mac, MAC_ZERO, ETH_ALEN); + e->access_count = 0; + refresh = true; + } + + if (!refresh) { + found = !MAC_IS_ZERO(e->mac); + } else { + found = nl_neigh_mac_get(nl_sock, addr, ifi, e->mac); + if (found) + mac_entry_set_expiry(e, &now, MAC_CACHE_RENEWAL); + } + + if (found) { + memcpy(mac, e->mac, ETH_ALEN); + } else { + /* Back off from new netlink calls if nothing found */ + mac_entry_set_expiry(e, &now, e->access_count++); + memcpy(mac, c->our_tap_mac, ETH_ALEN); + } + + /* Set to most recently used */ + fwd_mac_cache_move_to_tail(e); + + return found; +} + +/** + * fwd_mac_cache_init() - Initiate ARP/NDP cache table + */ +void fwd_mac_cache_init(void) +{ + struct mac_cache_table *t = &mac_cache; + struct mac_cache_entry *e; + int i; + + memset(t, 0, sizeof(*t)); + + for (i = 0; i < MAC_CACHE_SIZE; i++) { + e = &t->mac_table[i]; + e->prev = (i == 0) ? NULL : &t->mac_table[i - 1]; + e->next = (i < (MAC_CACHE_SIZE - 1)) ? &t->mac_table[i + 1] : NULL; + } + t->head = &t->mac_table[0]; + t->tail = &t->mac_table[MAC_CACHE_SIZE - 1]; +} + /** fwd_probe_ephemeral() - Determine what ports this host considers ephemeral * * Work out what ports the host thinks are emphemeral and record it for later diff --git a/fwd.h b/fwd.h index 65c7c96..728601f 100644 --- a/fwd.h +++ b/fwd.h @@ -57,4 +57,8 @@ uint8_t fwd_nat_from_splice(const struct ctx *c, uint8_t proto, uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, const struct flowside *ini, struct flowside *tgt);
+bool fwd_neigh_mac_get(const struct ctx *c, const union inany_addr *addr, + uint8_t *mac); +void fwd_mac_cache_init(void); + #endif /* FWD_H */ diff --git a/util.c b/util.c index c492f90..8a8846c 100644 --- a/util.c +++ b/util.c @@ -305,6 +305,18 @@ long timespec_diff_ms(const struct timespec *a, const struct timespec *b) return timespec_diff_us(a, b) / 1000; }
+/** + * timespec_before() - Check the relation between two points in time + * @a: Point in time to be tested + * @b: Point in time test a against + * Return: True if a comes before b, otherwise b + */ +bool timespec_before(const struct timespec *a, const struct timespec *b) +{ + return (a->tv_sec < b->tv_sec) || + (a->tv_sec == b->tv_sec && a->tv_nsec < b->tv_nsec); +} + /** * bitmap_set() - Set single bit in bitmap * @map: Pointer to bitmap diff --git a/util.h b/util.h index 2a8c38f..5ec3f22 100644 --- a/util.h +++ b/util.h @@ -207,6 +207,7 @@ int sock_unix(char *sock_path); void sock_probe_mem(struct ctx *c); long timespec_diff_ms(const struct timespec *a, const struct timespec *b); int64_t timespec_diff_us(const struct timespec *a, const struct timespec *b); +bool timespec_before(const struct timespec *a, const struct timespec *b); void bitmap_set(uint8_t *map, unsigned bit); void bitmap_clear(uint8_t *map, unsigned bit); bool bitmap_isset(const uint8_t *map, unsigned bit); -- 2.50.1
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
When we receive an ARP request or NDP neigbor solicitation over
the tap interface for a host on the local network segment attached
to the template interface, we respond with that host's real MAC
address.
Signed-off-by: Jon Maloy
When communicating with remote hosts on the local network, some guest
applications want to see the real MAC address of that host instead
of PASST/PASTA's own tap address. The flow_common structure is a
convenient location for storing that address, so we do that in this
commit.
Note that we don´t add actual usage of this address here, that will
be done in later commits.
Signed-off-by: Jon Maloy
We forward the incoming MAC address through the tap interface when
receiving incoming packets from network local hosts.
This is a part of the solution to bug
https://bugs.passt.top/show_bug.cgi?id=120
Signed-off-by: Jon Maloy
We forward the incoming mac address through the tap interface when
receiving incoming packets from network local hosts.
This is a part of the solution to bug
https://bugs.passt.top/show_bug.cgi?id=120
Signed-off-by: Jon Maloy
In the next commit it must be possible for the callers of function
tap_push_l2h() to specify which source MAC address should be
added to the ethernet header sent over the tap interface. As a
preparation, we now add a new argument to that function, still
without any logical changes.
Signed-off-by: Jon Maloy
Even ICMP needs to be updated to use the external MAC address instead
of just the own tap address when applicable. We do that here.
Signed-off-by: Jon Maloy
participants (2)
-
David Gibson
-
Jon Maloy