[PATCH 00/11] Simplify handling of "spliced" UDP forwarding
This series make some substantial simplifications to how we handle the forwarding of "spliced" UDP packets (those that don't go over the tuntap device, but instead are forwarded from one L4 socket to another). This doesn't yet change the existing (arguably broken) assumption that UDP communications are from one port to one port within the pasta namespace, not one to many or many to one. However, the simplifications made here will make it easier to correct that in future. Based on the earlier series for dual stack TCP sockets. David Gibson (11): udp: Also bind() connected ports for "splice" forwarding udp: Separate tracking of inbound and outbound packet flows udp: Always use sendto() rather than send() for forwarding spliced packets udp: Don't connect "forward" sockets for spliced flows udp: Remove the @bound field from union udp_epoll_ref udp: Split splice field in udp_epoll_ref into (mostly) independent bits udp: Don't create double sockets for -U port udp: Re-use fixed bound sockets for packet forwarding when possible udp: Don't explicitly track originating socket for spliced "connections" udp: Update UDP "connection" timestamps in both directions udp: Simplify udp_sock_handler_splice passt.h | 2 + udp.c | 344 ++++++++++++++++++++++---------------------------------- udp.h | 16 ++- 3 files changed, 144 insertions(+), 218 deletions(-) -- 2.38.1
pasta handles "spliced" port forwarding by resending datagrams received on
a bound socket in the init namespace to a connected socket in the guest
namespace. This means there are actually three ports associated with each
"connection". First there's the source and destination ports of the
originating datagram. That's also the destination port of the forwarded
datagram, but the source port of the forwarded datagram is the kernel
allocated bound address of the connected socket.
However, by bind()ing as well as connect()ing the forwarding socket we can
choose the source port of the forwarded datagrams. By choosing it to match
the original source port we remove that surprising third port number and
no longer need to store port numbers in struct udp_splice_port.
As a bonus this means that the recipient of the packets will see the
original source port if they call getpeername(). This rarely matters, but
it can't hurt.
Signed-off-by: David Gibson
Each entry udp_splice_map[v6][N] keeps information about two essentially
unrelated packet flows. @ns_conn_sock, @ns_conn_ts and @init_bound_sock
track a packet flow from port N in the host init namespace to some other
port in the pasta namespace (the one @ns_conn_sock is connected to).
@init_conn_sock, @init_conn_ts and @ns_bound_sock track packet flow from
port N in the pasta namespace to some other port in the host init namespace
(the one @init_conn_sock is connected to).
Split udp_splice_map[][] into two separate tables for the two directions.
Each entry in each table is a 'struct udp_splice_flow' with @orig_sock
(previously the bound socket), @target_sock (previously the connected
socket) and @ts (the timeout for the target socket).
Signed-off-by: David Gibson
udp_sock_handler_splice() has two different ways of sending out packets
once it has determined the correct destination socket. For the originating
sockets (which are not connected) it uses sendto() to specify a specific
address. For the forward socket (which is connected) we use send().
However we know the correct destination address even for the forward socket
we do also know the correct destination address. We can use this to use
sendto() instead of send(), removing the need for two different paths and
some staging data structures.
Signed-off-by: David Gibson
Currently we connect() the socket we use to forward spliced UDP flows.
However, we now only ever use sendto() rather than send() on this socket
so there's not actually any need to connect it. Don't do so.
Rename a number of things that referred to "connect" or "conn" since that
would now be misleading.
Signed-off-by: David Gibson
We set this field, but nothing ever checked it.
Signed-off-by: David Gibson
The @splice field in union udp_epoll_ref can have a number of values for
different types of "spliced" packet flows. Split it into several single
bit fields with more or less independent meanings. The new @splice field
is just a boolean indicating whether the socket is associated with a
spliced flow, making it identical to the @splice fiend in tcp_epoll_ref.
The new bit @orig, indicates whether this is a socket which can originate
new udp packet flows (created with -u or -U) or a socket created on the
fly to handle reply socket. @ns indicates whether the socket lives in the
init namespace or the pasta namespace.
Making these bits more orthogonal to each other will simplify some future
cleanups.
Signed-off-by: David Gibson
For each IP version udp_socket() has 3 possible calls to sock_l4(). One
is for the "non-spliced" bound socket in the init namespace, one for the
"spliced" bound socket in the init namespace and one for the "spliced"
bound socket in the pasta namespace.
However when this is called to create a socket in the pasta namspeace there
is a logic error which causes it to take the path for the init side spliced
socket as well as the ns socket. This essentially tries to create two
identical sockets on the ns side. Unsurprisingly the second bind() call
fails according to strace.
Correct this to only attempt to open one socket within the ns.
Signed-off-by: David Gibson
When we look up udp_splice_to_ns[v6][src].target_sock in
udp_sock_handler_splice, all we really require of the socket is that it
be bound to port src in the pasta guest namespace. Similarly for
udp_splice_to_init but bound in the init namespace.
Usually these sockets are created temporarily by udp_splice_connect() and
cleaned up by udp_timer(). However, depending on the -u and -U options its
possible we have a permanent socket bound to the relevant port created by
udp_sock_init(). If such a socket exists, we could use it instead of
creating a temporary one. In fact we *must* use it, because we'll fail
trying to bind() a temporary one to the same port.
So allow this, store permanently bound sockets into udp_splice_to_{ns,init}
in udp_sock_init(). These won't get incorrectly removed by the timer
because we don't put a corresponding entry in the udp_act[] structure
which directs the timer what to clean up.
Signed-off-by: David Gibson
When we look up udp_splice_to_ns[][].orig_sock in udp_sock_handler_splice()
we're finding the socket on which the originating packet for the
"connection" was received on. However, we don't specifically need this
socket to be the originating one - we just need one that's bound to the
the source port of this reply packet in the init namespace. We can look
this up in udp_splice_to_init[v6][src].target_sock, whose defining
characteristic is exactly that. The same applies with init and ns swapped.
In practice, of course, the port we locate this way will always be the
originating port, since we couldn't have started this "connection" if it
wasn't.
Change this, and we no longer need the @orig_sock field at all. That leaves just @target_sock which we rename to simply @sock. The
whole udp_splice_flow structure now more represents a single bound port
than a "flow" per se, so rename and recomment it accordingly. Likewise the
udp_splice_to_{ns,init} names are now misleading, since the ports in those
maps are used in both directions. Rename them to udp_splice_{ns,init}
indicating the location where the described socket is bound.
Signed-off-by: David Gibson
A UDP pseudo-connection between port A in the init namespace and port B in
the pasta guest namespace involves two sockets: udp_splice_init[v6][B]
and udp_splice_ns[v6][A]. The socket which originated this "connection"
will be permanent but the other one will be closed on a timeout.
When we get a packet from the originating socket, we update the timeout on
the other socket, but we don't do the same when we get a reply packet from
the other socket. However any activity on the "connection" probably
indicates that it's still in use. Without this we could incorrectly time
out a "connection" if it's using a protocol which involves a single
initiating packet, but which then gets continuing replies from the target.
Correct this by updating the timeout on both sockets for a packet in either
direction. This also updates the timestamps for the permanent originating
sockets which is unnecessary, but harmless.
Signed-off-by: David Gibson
Previous cleanups mean that we can now rework some complex ifs in
udp_sock_handler_splice() into a simpler set.
Signed-off-by: David Gibson
On Tue, 22 Nov 2022 14:43:51 +1100
David Gibson
This series make some substantial simplifications to how we handle the forwarding of "spliced" UDP packets (those that don't go over the tuntap device, but instead are forwarded from one L4 socket to another).
This doesn't yet change the existing (arguably broken) assumption that UDP communications are from one port to one port within the pasta namespace, not one to many or many to one. However, the simplifications made here will make it easier to correct that in future.
I'm not done reviewing this yet, but it all makes sense to me -- I just need a bit longer to convince myself of 1/10 to 3/10 despite the fact they're obviously simpler than the original implementation (i.e. of why I didn't implement it like that in the first place ;)). -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio