[PATCH] RFC: Remove unusable --netns-only option
The intended semantics of --netns-only are pretty unclear to me. It's
intended for pasta, but it's not clear whether its saying the spawned shell
should only enter the target netns, or that the passt/pasta packet
forwarding process should only sandbox itself in a network namespace, not
a user namespace.
In any case, as far as I can tell there's not actually any case in which
the --netns-only option will work. If nothing else, we will always fail
in sandbox(), because it attempts a number of operations which require
CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in
our initial user namespace when we start, so the only way we can have
CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which
we won't do with --netns-only.
For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before
we'll fail before we get to that point: in conf_ns_check() we'll attempt
to join the target network namespace. This also requires CAP_SYS_ADMIN in
both our current user namespace and the user namespace which owns the
target network namespace. Again, since we've dropped capabilities in our
original namespace this will never be the case.
For pasta creating its own network namespace we'll fail for a similar
reason in yet another place. This time we'll fail in nl_sock_init() again
because we attempt to enter the new network ns via NS_CALL without having
regained CAP_SYS_ADMIN by joining a new user namespace. Because this
happens after spawning the shell, it results in a weird failure mode, where
the pasta spawned shell is running, but pasta isn't actually handling
packets. Exiting the shell will lead to a hang until the process is
explicitly killed.
Since there's no way to invoke it, remove this feature.
Signed-off-by: David Gibson
On Tue, 19 Jul 2022 16:23:10 +1000
David Gibson
The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.
The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.
In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only.
For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case.
...however, we can also have UID 0 in a non-init user namespace, and that will work. This is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).
For pasta creating its own network namespace we'll fail for a similar reason in yet another place. This time we'll fail in nl_sock_init() again because we attempt to enter the new network ns via NS_CALL without having regained CAP_SYS_ADMIN by joining a new user namespace. Because this happens after spawning the shell, it results in a weird failure mode, where the pasta spawned shell is running, but pasta isn't actually handling packets. Exiting the shell will lead to a hang until the process is explicitly killed.
Ouch, I didn't think of this. Anyway, let me get back to you in a couple of days on the whole issue. The usage is there, albeit poorly documented, with a broken demo, and no handling of (kind of) corner cases. -- Stefano
On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote:
On Tue, 19 Jul 2022 16:23:10 +1000 David Gibson
wrote: The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.
The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.
Definitely. At present it also appears to affect the spawned shell as well, it a rather counter-intuitive way.
In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only.
For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case.
...however, we can also have UID 0 in a non-init user namespace, and that will work.
Hrm.. I thought being UID 0 just meant we started with all the capabilities, so once we've explicitly dropped them we still won't be able to do this. That seemed to be what happened when I tried running it as root.
This is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).
Can you explain a bit more about what the podman use case is, and why it requires the netns only logic?
For pasta creating its own network namespace we'll fail for a similar reason in yet another place. This time we'll fail in nl_sock_init() again because we attempt to enter the new network ns via NS_CALL without having regained CAP_SYS_ADMIN by joining a new user namespace. Because this happens after spawning the shell, it results in a weird failure mode, where the pasta spawned shell is running, but pasta isn't actually handling packets. Exiting the shell will lead to a hang until the process is explicitly killed.
Ouch, I didn't think of this.
Anyway, let me get back to you in a couple of days on the whole issue. The usage is there, albeit poorly documented, with a broken demo, and no handling of (kind of) corner cases.
-- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
On Wed, 20 Jul 2022 12:45:26 +1000
David Gibson
On Tue, Jul 19, 2022 at 10:39:25PM +0200, Stefano Brivio wrote:
On Tue, 19 Jul 2022 16:23:10 +1000 David Gibson
wrote: The intended semantics of --netns-only are pretty unclear to me. It's intended for pasta, but it's not clear whether its saying the spawned shell should only enter the target netns, or that the passt/pasta packet forwarding process should only sandbox itself in a network namespace, not a user namespace.
The latter. I think this is marginally more clear in the man page, but needs indeed a better explanation.
Definitely. At present it also appears to affect the spawned shell as well, it a rather counter-intuitive way.
Right, in that case we should restrict conditions where we can spawn a shell to having UID 0 in a non-init namespace. See working example below.
In any case, as far as I can tell there's not actually any case in which the --netns-only option will work. If nothing else, we will always fail in sandbox(), because it attempts a number of operations which require CAP_SYS_ADMIN in our current user namespace. We drop all capabilities in our initial user namespace when we start, so the only way we can have CAP_SYS_ADMIN at this point is if we've joined a new user namespace, which we won't do with --netns-only.
For pasta joining an existing namespace (the apparently intended use case), we'll actually fail before we'll fail before we get to that point: in conf_ns_check() we'll attempt to join the target network namespace. This also requires CAP_SYS_ADMIN in both our current user namespace and the user namespace which owns the target network namespace. Again, since we've dropped capabilities in our original namespace this will never be the case.
...however, we can also have UID 0 in a non-init user namespace, and that will work.
Hrm.. I thought being UID 0 just meant we started with all the capabilities, so once we've explicitly dropped them we still won't be able to do this. That seemed to be what happened when I tried running it as root.
If you run it as root, it will drop to nobody (or user passed via --runas), and it drops capabilities anyway, so it won't be able to do that. If you run it as UID 0 in a non-init namespace, it won't change the UID, though, and even after dropping capabilities, it will be able to join a network namespace.
This is what happens in the Podman integration case. Unfortunately the demo is broken at the moment (I had to rebase the patch with a bit of care, I'll publish the updated one soon).
Can you explain a bit more about what the podman use case is, and why it requires the netns only logic?
Podman creates a network namespace (with a filesystem handle), starts
slirp4netns (or pasta, in the integration draft) as UID 0 in a new user
namespace, pointing it to the network namespace:
# ps aux|grep pasta
sbrivio 2283703 0.0 0.0 2070672 56468 pts/10 Sl+ Jul19 0:40 ./bin/podman run --net=pasta:-T,5213-5214,-U,5213-5214 -p 5203-5204:5203-5204/tcp -p 5203-5204:5203-5204/udp --rm -ti alpine sh
sbrivio 2283760 0.1 0.0 85300 51120 ? Ss Jul19 0:57 /usr/bin/pasta --config-net -u 5203:5203 -t 5203:5203 -T 5213-5214 -U 5213-5214 /run/user/1000/netns/netns-3b6147d8-34e1-a516-87c3-631938a1973e
# readlink /proc/2283703/ns/net
net:[4026531992]
# readlink /proc/2283760/ns/net
net:[4026531992]
# readlink /proc/2283703/ns/user
user:[4026533032]
# readlink /proc/2283760/ns/user
user:[4026533032]
It's equivalent to this example (for convenience, with PIDs instead of
filesystem handles):
---
[TTY #0]
$ unshare -Ur
# echo $$
4117948
[TTY #1]
$ nsenter --preserve-credentials -U -t 4117948
# unshare -n
# ip li sh
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
# echo $$
4126920
[TTY #0]
# ./pasta -f --netns-only 4126920
Outbound interface: enp9s0, namespace interface: enp9s0
ARP:
address: a8:a1:59:8e:d7:b6
DHCP:
assign: 88.198.0.164
mask: 255.255.255.224
router: 88.198.0.161
DNS:
185.12.64.1
185.12.64.2
NDP/DHCPv6:
assign: 2a01:4f8:222:904::2
router: fe80::1
our link-local: fe80::aaa1:59ff:fe8e:d7b6
DNS:
2a01:4ff:ff00::add:2
2a01:4ff:ff00::add:1
[TTY #1]
# ip li sh
1: lo:
participants (2)
-
David Gibson
-
Stefano Brivio