[PATCH 00/18] Test fixes, batch 5
Here's yet another batch of fixes to make the tests more robust against different environments. With this lot, I'm now able to run the pasta, passt, passt_in_ns and two_guests tests on my Fedora system. I'm still hitting problems with the perf tests. This series (specifically 11/18) updates the demo to use socat instead of openbsd netcat. This is needed, or the change to socat in the mbuto image would break the demo. However, I'm hitting unrelated problems trying to run the demos, so the switch to socat is, alas, untested. David Gibson (18): tests: Remove no longer needed /usr/bin/bash link tests: Let Fedora find dhclient-script in /usr/sbin tests: Add rudimentary debugging to dhclient-script tests: Add some extra dhclient support directories to mbuto.img tests: More robust parsing of resolv.conf for DHCP tests tests: Handle the case of a nameserver on host localhost tests: Correctly handle domain search list in dhclient-script tests: Fix detection of empty 'hout' responses in passt{,_in_ns} tests tests: Fix creation of test file in udp passt tests valgrind needs futex tests: Use socat instead of netcat tests: Remove unnecessary ^D in passt_in_ns teardown tests: Remove unnecessary truncation of temporary files in udp tests tests: Use dhclient --no-pid for namespaces in two_guests tests tests: Clean up better after iperf tests tests: No need to retrieve host ifname in ndp/pasta tests: Correct determination of host interface name in tests demo: Use git protocol downloads Makefile | 2 +- test/README.md | 2 +- test/demo/passt | 10 +-- test/demo/pasta | 14 ++--- test/dhcp/passt | 22 +++---- test/lib/setup | 24 ++++++-- test/lib/test | 2 +- test/ndp/pasta | 5 +- test/passt.mbuto | 18 +++--- test/tcp/passt | 36 +++++------ test/tcp/passt_in_ns | 138 +++++++++++++++++++++--------------------- test/tcp/pasta | 58 +++++++++--------- test/two_guests/basic | 24 ++++---- test/udp/passt | 26 ++++---- test/udp/passt_in_ns | 82 +++++++++++-------------- test/udp/pasta | 34 +++++------ 16 files changed, 247 insertions(+), 250 deletions(-) -- 2.36.1
AFAICT the symlink we created in mbuto from /usr/bin/bash to /bin/sh was
for the benefit of a dhclient-script which used /usr/bin/bash as its
interpreter (e.g. in Fedora). That was a bit risky if the script really
did require bash and we linked it to dash or another shell.
We now supply our own custom dhclient-script, so we don't need the
link any more.
Signed-off-by: David Gibson
Modern Fedora (and RHEL) systems have /sbin as a symlink to /usr/sbin
(along with a number of similar links). Along with that it expects to
find dhclient-script in /usr/sbin/dhclient-script rather than
/sbin/dhclient-script.
Link them together in our mbuto image so that the Fedora build of dhclient
can find it.
Signed-off-by: David Gibson
We now supply a minimal dhclient-script of our own in the mbuto boot image.
There are some problems with it, so add some basic logging to help debug
it.
Signed-off-by: David Gibson
Although it can operate without them, dhclient can issue errors if it
doesn't have /var/run to write a pid file and /var/lib to write a leases
file. Create those in mbuto.img to stop it complaining.
Signed-off-by: David Gibson
To check publishing of DNS information via DHCP, we need to extract a list
of nameservers and/or search domains from resolv.conf in the test script.
The current version (usually) leaves the result with a trailing ','.
That's usually ok because it happens on both guest and host sides. However
it's kind of confusing, and might stop working if the host had a
resolv.conf without a trailing \n on the last line. It also makes some
later changes we'll need more difficult.
So, normalize the output from resolv.conf a bit further, removing any
trailing ','. It turns out we can do this with a slightly less complex
sed expression than the one we already have.
Signed-off-by: David Gibson
We previously introduced a change to passt to handle the case where the
host machine is its own nameserver - so resolv.conf points to 127.0.0.1.
In this case we advertize the gateway as the DNS server for the guest,
which in turn will be redirected back to the host by existing passt logic.
The dhcp/passt doesn't handle this case correctly, so add some logic to
account for it.
Signed-off-by: David Gibson
Currently our small custom dhclient-script only handles the 'domain-name'
option, which can just list a single domain, not the 'domain-search'
option, which can handle several. Correct it to handle both.
We also weren't emptying the resolv.conf file before we began, which
could lead to surprising contents after multiple DHCP transactions.
Signed-off-by: David Gibson
The dhcp/passt and dhcp/passt_in_ns tests at least, and maybe others
use 'hout' commands that need to be able to detect empty output.
However, we don't set PS1, which means the screen-scraping logic which
detects this may not be reliable. In addition, if the host is using a
recent bash, it will have bracketed paste mode enabled which will also
add escape codes which will mess up the empty output detection.
Set the prompt and disable bracketed paste mode from the passt and
passt_in_ns setups to avoid these problems.
Signed-off-by: David Gibson
In what looks like a copy-and-paste error from the TCP script, the
udp/passt test script creates a test file called '__TEMP_BIG__', while
the commands it use the variable __TEMP__. Correct this so that a) we
actually transfer the data we created for the purpose and b) we don't
leave a stale __TEMP_BIG__ file in the current directory.
Signed-off-by: David Gibson
Some versions of valgrind (such as the version on my Fedora laptop -
valgrind-3.19.0-3.fc36.x86_64) use futexes. But futex is currently not
allowed in the seccomp filter, even with the extra calls added for
valgrind builds. Add it, to avoid spurious valgrind failures.
Signed-off-by: David Gibson
Commit 41c02e10 ("tests: Use nmap-ncat instead of openbsd netcat for pasta
tests") updated the pasta tests to use the nmap version of ncat instead of
the openbsd version, for greater portability.
For some upcoming changes, however, we'll be wanting to use socat.
"socat" can do everything "ncat" can and more, so let's move all the
tests using host tools (either directly on the host or via mbuto
generated images) to using socat instead.
Signed-off-by: David Gibson
teardown_passt_in_ns() sends a ^D to the NS pane, which appears to be
intended to terminate the nsenter running there, leaving the namespace.
However, we've also sent a ^D to the PASST pane which will exit the pasta
instance which created the namespace. With the namespace destroyed the
nsenter in the NS pane will be killed, so it does not need to be exited
explicitly.
In fact sending the extra ^D can be harmful, since it will exit the shell
in which the nsenter was run, causing the whole pane to be closed. That
can then mean that the "pane_wait NS" hangs indefinitely. I believe this
will sometimes work, because there's a race between the various options
here, but it should be more reliable without the extra ^D.
Signed-off-by: David Gibson
All the UDP tests use :> to truncate some temporary data files. This
appears to be so that they're empty before writing data to them with tee.
However tee, by default, truncates its output file anyway (you need tee -a
to append). So drop the unnecessary truncations.
Signed-off-by: David Gibson
Before starting the guests, these tests configure addresses in a pasta
namespace using dhclient. However, because it's a user namespace, it's
not running as "real" root and can't write to the dhclient pid file.
This doesn't stop it working, but causes an ugly error message which we
can avoid by using the --no-pid option.
Signed-off-by: David Gibson
The iperf based test commands create a bunch of .bw and .pid files for
each iperf client and server. The server side .bw files are cleaned
up afterwards, but the pid files are not, and none of the client side
files are cleaned up. The latter doesn't really matter when the
client is run on ephemeral guests, but sometimes we run it in a
namespace that shares the filesystem with the host.
Clean up all of these files after the tests.
Signed-off-by: David Gibson
With pasta, the namespace interface name is generally the same as the host
interface name. We already rely on this in the dhcp/pasta tests, but for
no clear reason ndp/pasta separately determines the host interface name.
Remove this unnecessary step.
Signed-off-by: David Gibson
By default, passt itself attaches to the first host interface with a
default route. However, when determining the host interface name the tests
implicitly select the *last* host interface: they use a jq expression which
will list all interfaces with default routes, but the way output detection
works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they
each have a different address, this can cause spurious test failures.
Signed-off-by: David Gibson
On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:
By default, passt itself attaches to the first host interface with a default route. However, when determining the host interface name the tests implicitly select the *last* host interface: they use a jq expression which will list all interfaces with default routes, but the way output detection works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they each have a different address, this can cause spurious test failures.
It seems this change is not enough to always fix the tests when there are multiple default routes. I'm still sometimes getting failures, now because passt itself doesn't seem to be picking the interface with the first default route. I'm wondering if this is because ip(8) is sorting the output, not just presenting it in the same order that the underlying netlink interface does.
Signed-off-by: David Gibson
--- test/dhcp/passt | 2 +- test/two_guests/basic | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/test/dhcp/passt b/test/dhcp/passt index 3d2e939..f45227a 100644 --- a/test/dhcp/passt +++ b/test/dhcp/passt @@ -16,7 +16,7 @@ htools ip jq sed tr head
test Interface name gout IFNAME ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' -hout HOST_IFNAME ip -j -4 route show|jq -rM '.[] | select(.dst == "default").dev' +hout HOST_IFNAME ip -j -4 route show|jq -rM '[.[] | select(.dst == "default").dev] | .[0]' check [ -n "__IFNAME__" ]
test DHCP: address diff --git a/test/two_guests/basic b/test/two_guests/basic index cf0b975..f7c016d 100644 --- a/test/two_guests/basic +++ b/test/two_guests/basic @@ -18,7 +18,7 @@ htools ip jq md5sum cut test Interface names g1out IFNAME1 ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' g2out IFNAME2 ip -j link show | jq -rM '.[] | select(.link_type == "ether").ifname' -hout HOST_IFNAME ip -j -4 route show|jq -rM '.[] | select(.dst == "default").dev' +hout HOST_IFNAME ip -j -4 route show|jq -rM '[.[] | select(.dst == "default").dev] | .[0]' check [ -n "__IFNAME1__" ] check [ -n "__IFNAME2__" ]
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Tue, 19 Jul 2022 16:20:45 +1000
David Gibson
On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:
By default, passt itself attaches to the first host interface with a default route. However, when determining the host interface name the tests implicitly select the *last* host interface: they use a jq expression which will list all interfaces with default routes, but the way output detection works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they each have a different address, this can cause spurious test failures.
It seems this change is not enough to always fix the tests when there are multiple default routes. I'm still sometimes getting failures, now because passt itself doesn't seem to be picking the interface with the first default route.
I'm wondering if this is because ip(8) is sorting the output, not just presenting it in the same order that the underlying netlink interface does.
I don't see that happening at least in my environment (and I also can't see any code that would sort it, that pretty much comes from rtnl_dump_filter_l() in lib/libnetlink.c). The netlink "filter", though, is slightly different. For IPv4: $ strace -e sendto ip route show >/dev/null sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156 $ strace -e sendto ./passt -f sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 [...] and, while I don't think the FIB trie is actually descended in a different way, we might still have a slightly different result from the kernel (if I recall correctly, I didn't check right now). But letting that aside for a moment: if you have two default routes, I suppose they have different metrics. If not, what's the intended usage? If yes, we should probably implement a sorting logic in passt, so that the route with the lowest metric is picked, and then adjust the jq expression to also pick that one. -- Stefano
On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote:
On Tue, 19 Jul 2022 16:20:45 +1000 David Gibson
wrote: On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:
By default, passt itself attaches to the first host interface with a default route. However, when determining the host interface name the tests implicitly select the *last* host interface: they use a jq expression which will list all interfaces with default routes, but the way output detection works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they each have a different address, this can cause spurious test failures.
It seems this change is not enough to always fix the tests when there are multiple default routes. I'm still sometimes getting failures, now because passt itself doesn't seem to be picking the interface with the first default route.
I'm wondering if this is because ip(8) is sorting the output, not just presenting it in the same order that the underlying netlink interface does.
I don't see that happening at least in my environment (and I also can't see any code that would sort it, that pretty much comes from rtnl_dump_filter_l() in lib/libnetlink.c).
The netlink "filter", though, is slightly different. For IPv4:
$ strace -e sendto ip route show >/dev/null sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156
$ strace -e sendto ./passt -f sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 [...]
and, while I don't think the FIB trie is actually descended in a different way, we might still have a slightly different result from the kernel (if I recall correctly, I didn't check right now).
But letting that aside for a moment: if you have two default routes, I suppose they have different metrics. If not, what's the intended usage?
Yes, there are two different metrics. I was thinking after I sent this that we should sort by metric.
If yes, we should probably implement a sorting logic in passt, so that the route with the lowest metric is picked, and then adjust the jq expression to also pick that one.
Agreed. I'll see if I can implement this. Up to you whether you drop this patch or I just do another update on top of it. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote:
On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote:
On Tue, 19 Jul 2022 16:20:45 +1000 David Gibson
wrote: On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:
By default, passt itself attaches to the first host interface with a default route. However, when determining the host interface name the tests implicitly select the *last* host interface: they use a jq expression which will list all interfaces with default routes, but the way output detection works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they each have a different address, this can cause spurious test failures.
It seems this change is not enough to always fix the tests when there are multiple default routes. I'm still sometimes getting failures, now because passt itself doesn't seem to be picking the interface with the first default route.
I'm wondering if this is because ip(8) is sorting the output, not just presenting it in the same order that the underlying netlink interface does.
I don't see that happening at least in my environment (and I also can't see any code that would sort it, that pretty much comes from rtnl_dump_filter_l() in lib/libnetlink.c).
The netlink "filter", though, is slightly different. For IPv4:
$ strace -e sendto ip route show >/dev/null sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156
$ strace -e sendto ./passt -f sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 [...]
and, while I don't think the FIB trie is actually descended in a different way, we might still have a slightly different result from the kernel (if I recall correctly, I didn't check right now).
So, I've been staring at this for most of the day. I don't think the difference is because of the slightly different filters. Instead it's an easy to miss subtletly in nl_get_ext_if(). In nl_get_ext_if() both first_v4 and first_v6 will be set to the interface which meets the criteria and comes first in the order of routes returned by netlink from the kernel. However, there's this early return: word = (long *)has_v4; for (i = 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) { tmp = *word; while ((n = ffsl(tmp))) { int ifi = i * sizeof(long) * 8 + n - 1; if (!first_v4) first_v4 = ifi; tmp &= ~(1UL << (n - 1)); if (bitmap_isset(has_v6, ifi)) { *v4 = *v6 = IP_VERSION_ENABLED; return ifi; <================ HERE } } } Which means that if we find an interface with both a v4 and a v6 default route, we'll return it immediately. That loop is across the has_v4 bitmap, so we'll return the first one in *interface index order*, which might not necessarily be the same as the order the routes are returned from netlink. In the other cases: one of v4 or v6 is disabled, or there's no interface with both v4 and v6 default routes, then we'll end up returning the first interface in netlink route order. Which makes passt's overall priority of selecting interfaces pretty hard to follow. I've been thinking a bunch about what the order should be. The obvious cleanup to the current logic would be: 1. If there are >0 interfaces with both default v4 and v6 routes, pick the one with the lowest metric 2. Otherwise if there are >0 interfaces with a default v4 route, pick the one with the lowest metric 3. Otherwise if there are >0 interfaces with a default v6 route, pick the one with the lowest metric 4. Otherwise fail. ..but, I think that's the wrong choice. For one thing it gets weird in the unlikely but possible case that there are two interfaces with both v4 and v6 routes, but one has a lower v4 metric and the other has a lower v6 metric. But more importantly, I think it fails the "do what you want by default" test in the case of a host that has both v4 and v6 routing, but not on the same interface. That's quite a realistic situation if the v6 connectivity is coming from a tunnel (6in4 or 6rd or 6over4 or whatever). Instead, I think we should separately pick an external interface for v4 and v6 connectivity. Yes, it's weird to have a view of two different host interfaces on the same virtual interface. It also means pasta has to arbitrarily pick one of the host interface names for the namespace, which will make the tests a bit uglier. But it makes picking the right interfaces a lot simpler and clearer, and does what you probably want for the case of a host with tunnelled ipv6 connectivity.
But letting that aside for a moment: if you have two default routes, I suppose they have different metrics. If not, what's the intended usage?
Yes, there are two different metrics. I was thinking after I sent this that we should sort by metric.
If yes, we should probably implement a sorting logic in passt, so that the route with the lowest metric is picked, and then adjust the jq expression to also pick that one.
Agreed. I'll see if I can implement this. Up to you whether you drop this patch or I just do another update on top of it.
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Wed, 20 Jul 2022 17:24:18 +1000
David Gibson
On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote:
On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote:
On Tue, 19 Jul 2022 16:20:45 +1000 David Gibson
wrote: On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:
By default, passt itself attaches to the first host interface with a default route. However, when determining the host interface name the tests implicitly select the *last* host interface: they use a jq expression which will list all interfaces with default routes, but the way output detection works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they each have a different address, this can cause spurious test failures.
It seems this change is not enough to always fix the tests when there are multiple default routes. I'm still sometimes getting failures, now because passt itself doesn't seem to be picking the interface with the first default route.
I'm wondering if this is because ip(8) is sorting the output, not just presenting it in the same order that the underlying netlink interface does.
I don't see that happening at least in my environment (and I also can't see any code that would sort it, that pretty much comes from rtnl_dump_filter_l() in lib/libnetlink.c).
The netlink "filter", though, is slightly different. For IPv4:
$ strace -e sendto ip route show >/dev/null sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156
$ strace -e sendto ./passt -f sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 [...]
and, while I don't think the FIB trie is actually descended in a different way, we might still have a slightly different result from the kernel (if I recall correctly, I didn't check right now).
So, I've been staring at this for most of the day. I don't think the difference is because of the slightly different filters. Instead it's an easy to miss subtletly in nl_get_ext_if().
In nl_get_ext_if() both first_v4 and first_v6 will be set to the interface which meets the criteria and comes first in the order of routes returned by netlink from the kernel. However, there's this early return:
word = (long *)has_v4; for (i = 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) { tmp = *word; while ((n = ffsl(tmp))) { int ifi = i * sizeof(long) * 8 + n - 1;
if (!first_v4) first_v4 = ifi;
tmp &= ~(1UL << (n - 1)); if (bitmap_isset(has_v6, ifi)) { *v4 = *v6 = IP_VERSION_ENABLED; return ifi; <================ HERE } } }
Which means that if we find an interface with both a v4 and a v6 default route, we'll return it immediately. That loop is across the has_v4 bitmap, so we'll return the first one in *interface index order*, which might not necessarily be the same as the order the routes are returned from netlink.
Whoops :) ...right, sorry, I forgot about that.
In the other cases: one of v4 or v6 is disabled, or there's no interface with both v4 and v6 default routes, then we'll end up returning the first interface in netlink route order. Which makes passt's overall priority of selecting interfaces pretty hard to follow.
I've been thinking a bunch about what the order should be. The obvious cleanup to the current logic would be: 1. If there are >0 interfaces with both default v4 and v6 routes, pick the one with the lowest metric 2. Otherwise if there are >0 interfaces with a default v4 route, pick the one with the lowest metric 3. Otherwise if there are >0 interfaces with a default v6 route, pick the one with the lowest metric 4. Otherwise fail.
..but, I think that's the wrong choice. For one thing it gets weird in the unlikely but possible case that there are two interfaces with both v4 and v6 routes, but one has a lower v4 metric and the other has a lower v6 metric. But more importantly, I think it fails the "do what you want by default" test in the case of a host that has both v4 and v6 routing, but not on the same interface. That's quite a realistic situation if the v6 connectivity is coming from a tunnel (6in4 or 6rd or 6over4 or whatever).
Definitely, I didn't consider the tunneled IPv6 case, but it looks still rather real even nowadays.
Instead, I think we should separately pick an external interface for v4 and v6 connectivity. Yes, it's weird to have a view of two different host interfaces on the same virtual interface. It also means pasta has to arbitrarily pick one of the host interface names for the namespace, which will make the tests a bit uglier. But it makes picking the right interfaces a lot simpler and clearer, and does what you probably want for the case of a host with tunnelled ipv6 connectivity.
I totally agree. It's still less weird than the alternative. We'll need to store two interface indices (say, "ifi4" and "ifi6"), but their usage right now is almost entirely limited to configuration and netlink functions. One additional usage is to select a given interface for sockets bound to link-local addresses by means of setting .sin6_scope_id in struct sockaddr_in6 -- that happens in tcp_conn_from_tap() and sock_l4(). There, we just need to switch to the new "ifi6" index. Let me know if I should start looking into this or if you can do it instead. -- Stefano
On Wed, Jul 20, 2022 at 10:23:22AM +0200, Stefano Brivio wrote:
On Wed, 20 Jul 2022 17:24:18 +1000 David Gibson
wrote: On Wed, Jul 20, 2022 at 12:47:57PM +1000, David Gibson wrote:
On Tue, Jul 19, 2022 at 09:05:52PM +0200, Stefano Brivio wrote:
On Tue, 19 Jul 2022 16:20:45 +1000 David Gibson
wrote: On Fri, Jul 15, 2022 at 03:21:40PM +1000, David Gibson wrote:
By default, passt itself attaches to the first host interface with a default route. However, when determining the host interface name the tests implicitly select the *last* host interface: they use a jq expression which will list all interfaces with default routes, but the way output detection works in the scripts, it will only pick up the last line.
If there are multiple interfaces with default routes on the host, and they each have a different address, this can cause spurious test failures.
It seems this change is not enough to always fix the tests when there are multiple default routes. I'm still sometimes getting failures, now because passt itself doesn't seem to be picking the interface with the first default route.
I'm wondering if this is because ip(8) is sorting the output, not just presenting it in the same order that the underlying netlink interface does.
I don't see that happening at least in my environment (and I also can't see any code that would sort it, that pretty much comes from rtnl_dump_filter_l() in lib/libnetlink.c).
The netlink "filter", though, is slightly different. For IPv4:
$ strace -e sendto ip route show >/dev/null sendto(3, [{{len=36, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=1658257027, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_UNSPEC, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNSPEC, rtm_flags=0}, {{nla_len=8, nla_type=RTA_TABLE}, RT_TABLE_MAIN}}, {len=0, type=0 /* NLMSG_??? */, flags=0, seq=0, pid=0}], 156, 0, NULL, 0) = 156
$ strace -e sendto ./passt -f sendto(5, {{len=28, type=RTM_GETROUTE, flags=NLM_F_REQUEST|NLM_F_DUMP, seq=0, pid=0}, {rtm_family=AF_INET, rtm_dst_len=0, rtm_src_len=0, rtm_tos=0, rtm_table=RT_TABLE_MAIN, rtm_protocol=RTPROT_UNSPEC, rtm_scope=RT_SCOPE_UNIVERSE, rtm_type=RTN_UNICAST, rtm_flags=0}}, 28, 0, NULL, 0) = 28 [...]
and, while I don't think the FIB trie is actually descended in a different way, we might still have a slightly different result from the kernel (if I recall correctly, I didn't check right now).
So, I've been staring at this for most of the day. I don't think the difference is because of the slightly different filters. Instead it's an easy to miss subtletly in nl_get_ext_if().
In nl_get_ext_if() both first_v4 and first_v6 will be set to the interface which meets the criteria and comes first in the order of routes returned by netlink from the kernel. However, there's this early return:
word = (long *)has_v4; for (i = 0; i < ARRAY_SIZE(has_v4) / sizeof(long); i++, word++) { tmp = *word; while ((n = ffsl(tmp))) { int ifi = i * sizeof(long) * 8 + n - 1;
if (!first_v4) first_v4 = ifi;
tmp &= ~(1UL << (n - 1)); if (bitmap_isset(has_v6, ifi)) { *v4 = *v6 = IP_VERSION_ENABLED; return ifi; <================ HERE } } }
Which means that if we find an interface with both a v4 and a v6 default route, we'll return it immediately. That loop is across the has_v4 bitmap, so we'll return the first one in *interface index order*, which might not necessarily be the same as the order the routes are returned from netlink.
Whoops :) ...right, sorry, I forgot about that.
In the other cases: one of v4 or v6 is disabled, or there's no interface with both v4 and v6 default routes, then we'll end up returning the first interface in netlink route order. Which makes passt's overall priority of selecting interfaces pretty hard to follow.
I've been thinking a bunch about what the order should be. The obvious cleanup to the current logic would be: 1. If there are >0 interfaces with both default v4 and v6 routes, pick the one with the lowest metric 2. Otherwise if there are >0 interfaces with a default v4 route, pick the one with the lowest metric 3. Otherwise if there are >0 interfaces with a default v6 route, pick the one with the lowest metric 4. Otherwise fail.
..but, I think that's the wrong choice. For one thing it gets weird in the unlikely but possible case that there are two interfaces with both v4 and v6 routes, but one has a lower v4 metric and the other has a lower v6 metric. But more importantly, I think it fails the "do what you want by default" test in the case of a host that has both v4 and v6 routing, but not on the same interface. That's quite a realistic situation if the v6 connectivity is coming from a tunnel (6in4 or 6rd or 6over4 or whatever).
Definitely, I didn't consider the tunneled IPv6 case, but it looks still rather real even nowadays.
Yeah, I gather IPv6 is finally fairly standard in Germany, but here in Australia I'm with basically the only residential ISP which provides v6.
Instead, I think we should separately pick an external interface for v4 and v6 connectivity. Yes, it's weird to have a view of two different host interfaces on the same virtual interface. It also means pasta has to arbitrarily pick one of the host interface names for the namespace, which will make the tests a bit uglier. But it makes picking the right interfaces a lot simpler and clearer, and does what you probably want for the case of a host with tunnelled ipv6 connectivity.
I totally agree. It's still less weird than the alternative.
Excellent.
We'll need to store two interface indices (say, "ifi4" and "ifi6"), but their usage right now is almost entirely limited to configuration and netlink functions.
Yes, I looked through all the uses of c->ifi, and it looks fairly straightforward.
One additional usage is to select a given interface for sockets bound to link-local addresses by means of setting .sin6_scope_id in struct sockaddr_in6 -- that happens in tcp_conn_from_tap() and sock_l4(). There, we just need to switch to the new "ifi6" index.
Let me know if I should start looking into this or if you can do it instead.
I'll work on it. Hoping I can get that done before heading on holiday next week. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
For some reason using https to clone from the passt git repo is very slow,
at least from network-distant places. Use git protocol in the demo instead
to avoid a tedious wait to get the source.
Signed-off-by: David Gibson
On Fri, 15 Jul 2022 15:21:23 +1000
David Gibson
Here's yet another batch of fixes to make the tests more robust against different environments. With this lot, I'm now able to run the pasta, passt, passt_in_ns and two_guests tests on my Fedora system. I'm still hitting problems with the perf tests.
This series (specifically 11/18) updates the demo to use socat instead of openbsd netcat. This is needed, or the change to socat in the mbuto image would break the demo. However, I'm hitting unrelated problems trying to run the demos, so the switch to socat is, alas, untested.
I'm testing this now (excluding 17/18) together with the demo fix-up series I just sent, hopefully some of the issues you found there are also gone. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio