On Wed, 7 Aug 2024 11:34:34 +0200
Paul Holzinger
On 07/08/2024 10:27, Stefano Brivio wrote:
If a parent accidentally or due to implementation reasons leaks any open file, we don't want to have access to them, except for the file passed via --fd, if any.
This is the case for Podman when Podman's parent leaks files into Podman: it's not practical for Podman to close unrelated files before starting pasta, as reported by Paul.
Use close_range(2) to close all open files except for standard streams and the one from --fd.
Given that parts of conf() depend on other files to be already opened, such as the epoll file descriptor, we can't easily defer this to a more convenient point, where --fd was already parsed. Introduce a minimal, duplicate version of --fd parsing to keep this simple.
Suggested-by: Paul Holzinger
Signed-off-by: Stefano Brivio --- v2: Move call to close_open_files() to isolate_initial() conf.c | 1 + isolation.c | 12 +++++++++--- isolation.h | 2 +- passt.c | 2 +- util.c | 36 ++++++++++++++++++++++++++++++++++++ util.h | 1 + 6 files changed, 49 insertions(+), 5 deletions(-)
diff --git a/conf.c b/conf.c index 14d8ece..89f5b3d 100644 --- a/conf.c +++ b/conf.c @@ -1260,6 +1260,7 @@ void conf(struct ctx *c, int argc, char **argv) c->tcp.fwd_in.mode = c->tcp.fwd_out.mode = FWD_UNSET; c->udp.fwd_in.mode = c->udp.fwd_out.mode = FWD_UNSET;
+ optind = 1; do { name = getopt_long(argc, argv, optstring, options, NULL);
diff --git a/isolation.c b/isolation.c index 4956d7e..45fba1e 100644 --- a/isolation.c +++ b/isolation.c @@ -29,7 +29,8 @@ * * Executed immediately after startup, drops capabilities we don't * need at any point during execution (or which we gain back when we - * need by joining other namespaces). + * need by joining other namespaces), and closes any leaked file we + * might have inherited from the parent process. * * 2. isolate_user() * ================= @@ -166,14 +167,17 @@ static void clamp_caps(void) }
/** - * isolate_initial() - Early, config independent self isolation + * isolate_initial() - Early, mostly config independent self isolation + * @argc: Argument count + * @argv: Command line options: only --fd (if present) is relevant here * * Should: * - drop unneeded capabilities + * - close all open files except for standard streams and the one from --fd * Musn't: * - remove filesytem access (we need to access files during setup) */ -void isolate_initial(void) +void isolate_initial(int argc, char **argv) { uint64_t keep;
@@ -207,6 +211,8 @@ void isolate_initial(void) keep |= BIT(CAP_SETFCAP) | BIT(CAP_SYS_PTRACE);
drop_caps_ep_except(keep); + + close_open_files(argc, argv); }
/** diff --git a/isolation.h b/isolation.h index 846b2af..80bb68d 100644 --- a/isolation.h +++ b/isolation.h @@ -7,7 +7,7 @@ #ifndef ISOLATION_H #define ISOLATION_H
-void isolate_initial(void); +void isolate_initial(int argc, char **argv); void isolate_user(uid_t uid, gid_t gid, bool use_userns, const char *userns, enum passt_modes mode); int isolate_prefork(const struct ctx *c); diff --git a/passt.c b/passt.c index ea5bece..4b3c306 100644 --- a/passt.c +++ b/passt.c @@ -211,7 +211,7 @@ int main(int argc, char **argv)
arch_avx2_exec(argv);
- isolate_initial(); + isolate_initial(argc, argv);
c.pasta_netns_fd = c.fd_tap = c.pidfile_fd = -1;
diff --git a/util.c b/util.c index 07fb21c..bd65b5a 100644 --- a/util.c +++ b/util.c @@ -26,6 +26,7 @@ #include
#include #include +#include #include "util.h" #include "iov.h" @@ -694,3 +695,38 @@ const char *str_ee_origin(const struct sock_extended_err *ee)
return "<invalid>"; } + +/** + * close_open_files() - Close leaked files, but not --fd, stdin, stdout, stderr + * @argc: Argument count + * @argv: Command line options, as we need to skip any file given via --fd + */ +void close_open_files(int argc, char **argv) +{ + const struct option optfd[] = { { "fd", required_argument, NULL, 'F' }, + { 0 }, + }; + long fd = -1; + int name; + + do { + name = getopt_long(argc, argv, ":F", optfd, NULL); + + if (name == 'F') { + errno = 0; + fd = strtol(optarg, NULL, 0); + + if (fd < 0 || errno) + die("Invalid --fd: %s", optarg); + } + } while (name != -1); + + if (fd == -1) { + if (close_range(3, ~0U, CLOSE_RANGE_UNSHARE)) + die_perror("Failed to close files leaked by parent"); + } else { + if (close_range(3, fd - 1, CLOSE_RANGE_UNSHARE) || + close_range(fd + 1, ~0U, CLOSE_RANGE_UNSHARE)) + die_perror("Failed to close files leaked by parent");
This will fail for fd == 3 as the first range is 3 - 2 in that case which causes EINVAL which could be a common choice for the extra passed fd.
Whoops.
On the other end of the range it is the same issue, you seem to parse the fd as long so allows values > 4294967295 which then overflows when passed to close_range as uint which causes issues and even allows us to close stderr streams. Less likely that users would pass such a number but no reason to allow it in the first place.
Of course. Nice catch as well. Thanks.
$ strace -e close_range ./pasta --config-net --fd 4294967296 echo test close_range(3, 4294967295, CLOSE_RANGE_UNSHARE) = 0 close_range(1, 4294967295, CLOSE_RANGE_UNSHARE) = 0
Hey, that's a cool feature! So much easier to type than >/dev/null. Sending v3. -- Stefano