[PATCH 0/2] Don't expose container loopback services to the host
podman issue #24045 pointed out that pasta's spliced forwarding logic can expose services within the namespace bound only to 127.0.0.1 or ::1 to the host. However, the namespace probably expects those to only be accessible to itself, so that's probably not what we want. This changes our forwarding logic so as not to do this. Note that the podman tests will currently fail with this series, I've submitted podman PR https://github.com/containers/podman/pull/24064 to fix that. Link: https://github.com/containers/podman/issues/24045 David Gibson (2): test: Clarify test for spliced inbound transfers fwd: Direct inbound spliced forwards to the guest's external address fwd.c | 24 +++++++++++++++--------- test/passt_in_ns/tcp | 8 ++++---- test/passt_in_ns/udp | 4 ++-- test/pasta/tcp | 16 ++++++++-------- test/pasta/udp | 8 ++++---- 5 files changed, 33 insertions(+), 27 deletions(-) -- 2.46.1
The tests in pasta/tcp and pasta/udp for inbound transfers have the server
listening within the namespace explicitly bound to 127.0.0.1 or ::1. This
only works because of the behaviour of inbound splice connections, which
always appear with both source and destination addresses as loopback in
the namespace. That's not an inherent property for "spliced" connections
and arguably an undesirable one. Also update the test names to make it
clearer that these tests are expecting to exercise the "splice" path.
Interestingly this was already correct for the equivalent passt_in_ns/*,
although we also update the test names for clarity there.
Note that there are similar issues in some of the podman tests, addressed
in https://github.com/containers/podman/pull/24064
Signed-off-by: David Gibson
In pasta mode, where addressing permits we "splice" connections, forwarding
directly from host socket to guest/container socket without any L2 or L3
processing. This gives us a very large performance improvement when it's
possible.
Since the traffic is from a local socket within the guest, it will go over
the guest's 'lo' interface, and accordingly we set the guest side address
to be the loopback address. However this has a surprising side effect:
sometimes guests will run services that are only supposed to be used within
the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's
forwarding exposes those services to the host, which isn't generally what
we want.
Correct this by instead forwarding inbound "splice" flows to the guest's
external address.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson
On Wed, 25 Sep 2024 16:54:36 +1000
David Gibson
In pasta mode, where addressing permits we "splice" connections, forwarding directly from host socket to guest/container socket without any L2 or L3 processing. This gives us a very large performance improvement when it's possible.
Since the traffic is from a local socket within the guest, it will go over the guest's 'lo' interface, and accordingly we set the guest side address to be the loopback address. However this has a surprising side effect: sometimes guests will run services that are only supposed to be used within the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's forwarding exposes those services to the host, which isn't generally what we want.
Correct this by instead forwarding inbound "splice" flows to the guest's external address.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson
--- fwd.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/fwd.c b/fwd.c index a505098..d5149db 100644 --- a/fwd.c +++ b/fwd.c @@ -447,20 +447,26 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { /* spliceable */
- /* Preserve the specific loopback adddress used, but let the - * kernel pick a source port on the target side + /* The traffic will go over the guest's 'lo' interface, but use + * its external address, so we don't inadvertendly expose
inadvertently
+ * services that listen only on the guest's loopback address. + * + * Let the kernel pick our address on PIF_SPLICE */ - tgt->oaddr = ini->eaddr; + if (inany_v4(&ini->eaddr)) { + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); + tgt->oaddr = inany_any4; + } else { + tgt->eaddr.a6 = c->ip6.addr_seen; + tgt->oaddr = inany_any6; + } + + /* Let the kernel pick port */ tgt->oport = 0; if (proto == IPPROTO_UDP) - /* But for UDP preserve the source port */ + /* Except for UDP, preserve the source port */
It means the same thing, but it's less clear now: does it mean "Except for UDP: in that case preserve the source port" or "Except for UDP: in all other cases preserve the source port"? I would keep the original version unless I'm missing something subtle that this patch would change regarding ports.
tgt->oport = ini->eport;
- if (inany_v4(&ini->eaddr)) - tgt->eaddr = inany_loopback4; - else - tgt->eaddr = inany_loopback6; - return PIF_SPLICE; }
The series looks good to me (yes, I know I still have to review one pending series from you), except for the concern I mentioned as a comment to the cover letter. -- Stefano
On Wed, Sep 25, 2024 at 10:29:44AM +0200, Stefano Brivio wrote:
On Wed, 25 Sep 2024 16:54:36 +1000 David Gibson
wrote: In pasta mode, where addressing permits we "splice" connections, forwarding directly from host socket to guest/container socket without any L2 or L3 processing. This gives us a very large performance improvement when it's possible.
Since the traffic is from a local socket within the guest, it will go over the guest's 'lo' interface, and accordingly we set the guest side address to be the loopback address. However this has a surprising side effect: sometimes guests will run services that are only supposed to be used within the guest and are therefore bound to only 127.0.0.1 and/or ::1. pasta's forwarding exposes those services to the host, which isn't generally what we want.
Correct this by instead forwarding inbound "splice" flows to the guest's external address.
Link: https://github.com/containers/podman/issues/24045
Signed-off-by: David Gibson
--- fwd.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/fwd.c b/fwd.c index a505098..d5149db 100644 --- a/fwd.c +++ b/fwd.c @@ -447,20 +447,26 @@ uint8_t fwd_nat_from_host(const struct ctx *c, uint8_t proto, (proto == IPPROTO_TCP || proto == IPPROTO_UDP)) { /* spliceable */
- /* Preserve the specific loopback adddress used, but let the - * kernel pick a source port on the target side + /* The traffic will go over the guest's 'lo' interface, but use + * its external address, so we don't inadvertendly expose
inadvertently
Oops, fixed.
+ * services that listen only on the guest's loopback address. + * + * Let the kernel pick our address on PIF_SPLICE */ - tgt->oaddr = ini->eaddr; + if (inany_v4(&ini->eaddr)) { + tgt->eaddr = inany_from_v4(c->ip4.addr_seen); + tgt->oaddr = inany_any4; + } else { + tgt->eaddr.a6 = c->ip6.addr_seen; + tgt->oaddr = inany_any6; + } + + /* Let the kernel pick port */ tgt->oport = 0; if (proto == IPPROTO_UDP) - /* But for UDP preserve the source port */ + /* Except for UDP, preserve the source port */
It means the same thing, but it's less clear now: does it mean "Except for UDP: in that case preserve the source port" or "Except for UDP: in all other cases preserve the source port"?
Good point. I was trying to make it a follow on from the earlier "Let the kernel pick port" comment, but you're right the old version was still clearer. Reverted.
I would keep the original version unless I'm missing something subtle that this patch would change regarding ports.
tgt->oport = ini->eport;
- if (inany_v4(&ini->eaddr)) - tgt->eaddr = inany_loopback4; - else - tgt->eaddr = inany_loopback6; - return PIF_SPLICE; }
The series looks good to me (yes, I know I still have to review one pending series from you), except for the concern I mentioned as a comment to the cover letter.
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Wed, 25 Sep 2024 16:54:34 +1000
David Gibson
podman issue #24045 pointed out that pasta's spliced forwarding logic can expose services within the namespace bound only to 127.0.0.1 or ::1 to the host. However, the namespace probably expects those to only be accessible to itself, so that's probably not what we want.
...that's what I thought would be desirable as you see from patch 1/2 and https://github.com/containers/podman/pull/24064. I think you're right in general but I would feel more confident applying this if 1. we briefly documented this in the man page and 2. we added an option to enable the current behaviour back (1. can be documented as part of documenting 2., then). The new fwd_nat_from_host() implementation seems to make this relatively trivial, but I'm not really familiar with it yet so I might be wrong. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio