On 15/03/2024 12:24, Stefano Brivio wrote:
Martin reports that, with Fedora Linux kernel version kernel-core-6.9.0-0.rc0.20240313gitb0546776ad3f.4.fc41.x86_64, including commit 87d381973e49 ("genetlink: fit NLMSG_DONE into same read() as families"), pasta doesn't exit once the network namespace is gone.
Actually, pasta is completely non-functional, at least with default options, because nl_route_dup(), which duplicates routes from the parent namespace into the target namespace at start-up, is stuck on a second receive operation for RTM_GETROUTE.
However, with that commit, the kernel is now able to fit the whole response, including the NLMSG_DONE message, into a single datagram, so no further messages will be received.
It turns out that commit 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request") accidentally relied on the fact that we would always get at least two datagrams as a response to RTM_GETROUTE.
That is, the test to check if we expect another datagram, is based on the 'status' variable, which is 0 if we just parsed NLMSG_DONE, but we'll also expect another datagram if NLMSG_OK on the last message is false. But NLMSG_OK with a zero length is always false.
The problem is that we don't distinguish if status is zero because we got a NLMSG_DONE message, or because we processed all the available datagram bytes.
Introduce an explicit check on NLMSG_DONE. We should probably refactor this slightly, for example by introducing a special return code from nl_status(), but this is probably the least invasive fix for the issue at hand.
Reported-by: Martin Pitt
Link: https://github.com/containers/podman/issues/22052 Fixes: 4d6e9d0816e2 ("netlink: Always process all responses to a netlink request") Signed-off-by: Stefano Brivio --- netlink.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/netlink.c b/netlink.c index 9e7cccb..20de9b3 100644 --- a/netlink.c +++ b/netlink.c @@ -525,7 +525,8 @@ int nl_route_dup(int s_src, unsigned int ifi_src, } }
- if (!NLMSG_OK(nh, status) || status > 0) { + if (nh->nlmsg_type != NLMSG_DONE && + (!NLMSG_OK(nh, status) || status > 0)) { /* Process any remaining datagrams in a different * buffer so we don't overwrite the first one. */ I was about to add my tested-by when I noticed a weird thing, but that happens only on the new kernel as well:
On the host $ ip route default via 192.168.122.1 dev enp1s0 proto dhcp src 192.168.122.92 metric 100 192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.92 metric 100 ./pasta --config-net ip route default via 192.168.122.1 dev enp1s0 proto dhcp metric 100 192.168.122.0/24 dev enp1s0 proto kernel scope link src 192.168.122.92 192.168.122.0/24 dev enp1s0 proto kernel scope link metric 100 It seems we now have the same local route duplicated for some reason? I am not sure if it is caused by this patch as I cannot test versions without this patch on a newer kernel. I can however confirm that this patch works and it no longer hangs.