[PATCH v2 00/11] tcp_splice: Better exploit symmetry between sides of connection
For spliced connections, both "sides" are sockets, and for many purposes how we deal with each side is symmetric. Currently, however, we track the information for each side in independent fields in the structure, meaning we can't easily exploit that symmetry. This makes a number of reorganizations of the tcp splice code so that we can explot that symmetry to reduce code size. This will have some additional advantages when we come to integrate with the in-progress unified flow table. Based on top of the interface identifiers and automatic forwarding cleanup series. Changes since v1: * Small updates to comments and commit messages David Gibson (11): tcp_splice: Remove redundant tcp_splice_epoll_ctl() tcp_splice: Correct error handling in tcp_splice_epoll_ctl() tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() tcp_splice: Remove unnecessary forward declaration tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() tcp_splice: Don't pool pipes in pairs tcp_splice: Rename sides of connection from a/b to 0/1 tcp_splice: Exploit side symmetry in tcp_splice_timer() tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() tcp_splice: Exploit side symmetry in tcp_splice_destroy() tcp_splice: Simplify selection of socket and pipe sides in socket handler tcp_conn.h | 45 +++--- tcp_splice.c | 389 +++++++++++++++++++++------------------------------ 2 files changed, 179 insertions(+), 255 deletions(-) -- 2.41.0
tcp_splice_conn_update() calls tcp_splice_epoll_ctl() twice: first ignoring
the return value, then checking it. This serves no purpose. If the first
call succeeds, the second call will do exactly the same thing again, since
nothing has changed in conn. If the first call fails, then
tcp_splice_epoll_ctl() itself will EPOLL_CTL_DEL both fds, meaning when
the second call tries to EPOLL_CTL_MOD them it will necessarily fail.
It appears that this duplication was introduced by accident in an
otherwise unrelated patch.
Fixes: bb708111 ("treewide: Packet abstraction with mandatory boundary checks")
Signed-off-by: David Gibson
If we get an error from epoll_ctl() in tcp_splice_epoll_ctl() we goto the
'delete' path where we remove both sockets from the epoll set and return
an error. There are several problems with this:
- We 'return -errno' after the EPOLL_CTL_DEL operations, which means the
deleting epoll_ctl() calls may have overwritten the errno values which
actually triggered the failures.
- The call from conn_flag_do() occurs when the CLOSING flag is set, in
which case we go do the delete path regardless of error. In that case
the 'return errno' is meaningless since we don't expect the EPOLL_CTL_DEL
operations to fail and we ignore the return code anyway.
- All other calls to tcp_splice_epoll_ctl() check the return code and if
non-zero immediately call conn_flag(..., CLOSING) which will call
tcp_splice_epoll_ctl() again explicitly to remove the sockets from epoll.
That means removing them when the error first occurs is redundant.
- We never specifically report an error on the epoll_ctl() operations. We
just set the connection to CLOSING, more or less silently killing it.
This could make debugging difficult in the unlikely even that we get a
failure here.
Re-organise tcp_splice_epoll_ctl() to just log a message then return in the
error case, and only EPOLL_CTL_DEL when explicitly asked to with the
CLOSING flag.
Signed-off-by: David Gibson
tcp_splice_epoll_ctl() removes both sockets from the epoll set if called
when conn->flags & CLOSING. This will always happen immediately after
setting that flag, since conn_flag_do() makes the call itself. That's also
the _only_ time it can happen: we perform the EPOLL_CTL_DEL without
clearing the conn->in_epoll flag, meaning that any further calls to
tcp_splice_epoll_ctl() would attempt EPOLL_CTL_MOD, which would necessarily
fail since the fds are no longer in the epoll.
The EPOLL_CTL_DEL path in tcp_splice_epoll_ctl() has essentially zero
overlap with anything else the function does, so just move them to be
open coded in conn_flag_do().
This does require kernel 2.6.9 or later, in order to pass NULL as the
event structure for epoll_ctl(). However, we already require at least
3.13 to allow unprivileged user namespaces.
Given that, simply directly perform the EPOLL_CTL_DEL operations from
conn_flag_do() rather than unnecessarily multiplexini
Signed-off-by: David Gibson
In tcp_splice.c we forward declare tcp_splice_epoll_ctl() then define it
later on. However, there are no circular dependencies which prevent us
from simply having the full definition in place of the forward declaration.
Signed-off-by: David Gibson
We initialise the events_a and events_b variables with
tcp_splice_conn_epoll_events() function, then immediately copy the values
into ev_a.events and ev_b.events. We can't simply pass &ev_[ab].events to
tcp_splice_conn_epoll_events(), because struct epoll_event is packed,
leading to 'pointer may be unaligned' warnings if we attempt that.
We can, however, make tcp_splice_conn_epoll_events() take struct
epoll_event pointers rather than raw u32 pointers, avoiding the awkward
temporaries.
Signed-off-by: David Gibson
To reduce latencies, the tcp splice code maintains a pool of pre-opened
pipes to use for new connections. This is structured as an array of pairs
of pipes, with each pipe, of course, being a pair of fds. Thus when we
use the pool, a single pool "slot" provides both the a->b and b->a pipes.
There's no strong reason to store the pool in pairs, though - we can
with not much difficulty instead take the a->b and b->a pipes for a new
connection independently from separate slots in the pool, or even take one
from the the pool and create the other as we need it, if there's only one
pipe left in the pool.
This marginally increases the length of code, but simplifies the structure
of the pipe pool. We should be able to re-shrink the code with later
changes, too.
In the process we also fix some minor bugs:
- If we both failed to find a pipe in the pool and to create a new one, we
didn't log an error and would silently drop the connection. That could
make debugging such a situation difficult. Add in an error message for
that case
- When refilling the pool, if we were only able to open a single pipe in
the pair, we attempted to rollback, but instead of closing the opened
pipe, we instead closed the pipe we failed to open (probably leading to
some ignored EBADFD errors).
Signed-off-by: David Gibson
Each spliced connection has two mostly, although not entirely, symmetric
sides. We currently call those "a" and "b" and have different fields in
the connection structure for each one.
We can better exploit that symmetry if we use two element arrays rather
thatn separately named fields. Do that in the places we can, and for the
others change the "a"/"b" terminology to 0/1 to match.
Signed-off-by: David Gibson
tcp_splice_timer() has two very similar blocks one after another that
handle the SO_RCVLOWAT flags for the two sides of the connection. We can
deduplicate this with a loop across the two sides.
Signed-off-by: David Gibson
tcp_splice_connect_finish() has two very similar blocks opening the two
pipes for each direction of the connection. We can deduplicate this with
a loop across the two sides.
Signed-off-by: David Gibson
tcp_splice_destroy() has some close-to-duplicated logic handling closing of
the socket and pipes for each side of the connection. We can use a loop
across the sides to reduce the duplication.
Signed-off-by: David Gibson
tcp_splice_sock_handler() uses the tcp_splice_dir() helper to select
which of the socket, pipe and counter fields to use depending on which
side of the connection the socket event is coming from.
Now that we are using arrays for the two sides, rather than separate named
fields, we can instead just use a variable indicating the side and use
that to index the arrays whever we need a particular side's field.
Signed-off-by: David Gibson
On Tue, 7 Nov 2023 13:42:39 +1100
David Gibson
For spliced connections, both "sides" are sockets, and for many purposes how we deal with each side is symmetric. Currently, however, we track the information for each side in independent fields in the structure, meaning we can't easily exploit that symmetry.
This makes a number of reorganizations of the tcp splice code so that we can explot that symmetry to reduce code size. This will have some additional advantages when we come to integrate with the in-progress unified flow table.
Based on top of the interface identifiers and automatic forwarding cleanup series.
Changes since v1: * Small updates to comments and commit messages
David Gibson (11): tcp_splice: Remove redundant tcp_splice_epoll_ctl() tcp_splice: Correct error handling in tcp_splice_epoll_ctl() tcp_splice: Don't handle EPOLL_CTL_DEL as part of tcp_splice_epoll_ctl() tcp_splice: Remove unnecessary forward declaration tcp_splice: Avoid awkward temporaries in tcp_splice_epoll_ctl() tcp_splice: Don't pool pipes in pairs tcp_splice: Rename sides of connection from a/b to 0/1 tcp_splice: Exploit side symmetry in tcp_splice_timer() tcp_splice: Exploit side symmetry in tcp_splice_connect_finish() tcp_splice: Exploit side symmetry in tcp_splice_destroy() tcp_splice: Simplify selection of socket and pipe sides in socket handler
Applied. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio