[PATCH v2 00/11] Fixes and cleanups for capability handling
Our current handling of capabilities isn't quite right. In particular, drop_caps() attempts to remove capabilities from the bounding set, which usually won't work, and even if it does won't have the effect we want. This series corrects that, as well as making some other fixes and cleanups in adjacent code. Changes since v1: * A number of minor style fixes * Should now correctly handle the case where we join an existing userns/netns, or just an existing netns (--netns-only) David Gibson (11): test: Move slower tests to end of test run pasta: More general way of starting spawned shell as a login shell pasta_start_ns() always ends in parent context Remove unhelpful drop_caps() call in pasta_start_ns() isolation: Clarify various self-isolation steps Replace FWRITE with a function isolation: Refactor isolate_user() to allow for a common exit path isolation: Replace drop_caps() with a version that actually does something isolation: Prevent any child processes gaining capabilities isolation: Only configure UID/GID mappings in userns when spawning shell Rename pasta_setup_ns() to pasta_spawn_cmd() conf.c | 5 +- isolation.c | 255 +++++++++++++++++++++++++++++++++++++++++++++------- isolation.h | 9 +- passt.c | 8 +- pasta.c | 72 +++++++++------ pasta.h | 3 +- test/run | 20 ++--- util.c | 33 +++++++ util.h | 13 +-- 9 files changed, 324 insertions(+), 94 deletions(-) -- 2.37.3
The distro and performance tests are by far the slowest part of the passt
testsuite. Move them to the end of the testsuite run, so that it's easier
to do a quick test during development by letting the other tests run then
interrupting the test runner.
Signed-off-by: David Gibson
When invoked so as to spawn a shell, pasta checks explicitly for the
shell being bash and if so, adds a "-l" option to make it a login shell.
This is not ideal, since this is a bash specific option and requires pasta
to know about specific shell variants.
There's a general convention for starting a login shell, which is to
prepend a "-" to argv[0]. Use this approach instead, so we don't need bash
specific logic.
Signed-off-by: David Gibson
The end of pasta_start_ns() has a test against pasta_child_pid, testing
if we're in the parent or the child. However we started the child running
the pasta_setup_ns function which always exec()s or exit()s, so if we
return from the clone() we are always in the parent, making that test
unnecessary.
Signed-off-by: David Gibson
drop_caps() has a number of bugs which mean it doesn't do what you'd
expect. However, even if we fixed those, the call in pasta_start_ns()
doesn't do anything useful:
* In the common case, we're UID 0 at this point. In this case drop_caps()
doesn't accomplish anything, because even with capabilities dropped, we
are still privileged.
* When attaching to an existing namespace with --userns or --netns-only
we might not be UID 0. In this case it's too early to drop all
capabilities: we need at least CAP_NET_ADMIN to configure the
tap device in the namespace.
Remove this call - we will still drop capabilities a little later in
sandbox().
Signed-off-by: David Gibson
We have a number of steps of self-isolation scattered across our code.
Improve function names and add comments to make it clearer what the self
isolation model is, what the steps do, and why they happen at the points
they happen.
Signed-off-by: David Gibson
In a few places we use the FWRITE() macro to open a file, replace it's
contents with a given string and close it again. There's no real
reason this needs to be a macro rather than just a function though.
Turn it into a function 'write_file()' and make some ancillary
cleanups while we're there:
- Add a return code so the caller can handle giving a useful error message
- Handle the case of short write()s (unlikely, but possible)
- Add O_TRUNC, to make sure we replace the existing contents entirely
Signed-off-by: David Gibson
Currently, isolate_user() exits early if the --netns-only option is given.
That works for now, but shortly we're going to want to add some logic to
go at the end of isolate_user() that needs to run in all cases: joining a
given userns, creating a new userns, or staying in our original userns
(--netns-only).
To avoid muddying those changes, here we reorganize isolate_user() to have
a common exit path for all cases.
Signed-off-by: David Gibson
The current implementation of drop_caps() doesn't really work because it
attempts to drop capabilities from the bounding set. That's not the set
that really matters, it's about limiting the abilities of things we might
later exec() rather than our own capabilities. It also requires
CAP_SETPCAP which we won't usually have.
Replace it with a new version which uses setcap(2) to drop capabilities
from the effective and permitted sets. For now we leave the inheritable
set as is, since we don't want to preclude the user from passing
inheritable capabilities to the command spawed by pasta.
Correctly dropping caps reveals that we were relying on some capabilities
we'd supposedly dropped. Re-divide the dropping of capabilities between
isolate_initial(), isolate_user() and isolate_prefork() to make this work.
Signed-off-by: David Gibson
We drop our own capabilities, but it's possible that processes we exec()
could gain extra privilege via file capabilities. It shouldn't be possible
for us to exec() anyway due to seccomp() and our filesystem isolation. But
just in case, zero the bounding and inheritable capability sets to prevent
any such child from gainin privilege.
Note that we do this *after* spawning the pasta shell/command (if any),
because we do want the user to be able to give that privilege if they want.
Signed-off-by: David Gibson
When in passt mode, or pasta mode spawning a command, we create a userns
for ourselves. This is used both to isolate the pasta/passt process itself
and to run the spawned command, if any.
Since eed17a47 "Handle userns isolation and dropping root at the same time"
we've handled both cases the same, configuring the UID and GID mappings in
the new userns to map whichever UID we're running as to root within the
userns.
This mapping is desirable when spawning a shell or other command, so that
the user gets a root shell with reasonably clear abilities within the
userns and netns. It's not necessarily essential, though. When not
spawning a shell, it doesn't really have any purpose: passt itself doesn't
need to be root and can operate fine with an unmapped user (using some of
the capabilities we get when entering the userns instead).
Configuring the uid_map can cause problems if passt is running with any
capabilities in the initial namespace, such as CAP_NET_BIND_SERVICE to
allow it to forward low ports. In this case the kernel makes files in
/proc/pid owned by root rather than the starting user to prevent the user
from interfering with the operation of the capability-enhanced process.
This includes uid_map meaning we are not able to write to it.
Whether this behaviour is correct in the kernel is debatable, but in any
case we might as well avoid problems by only initializing the user mappings
when we really want them.
Signed-off-by: David Gibson
pasta_setup_ns() no longer has much to do with setting up a namespace.
Instead it's really about starting the shell or other command we want to
run with pasta connectivity. Rename it and its argument structure to be
less misleading.
Signed-off-by: David Gibson
participants (1)
-
David Gibson