[PATCH v4 00/13] Manage more flow related things from generic flow code
There are a number of things that are more-or-less general to flows which are still explicitly handled in tcp.c and tcp_splice.c including allocation and freeing of flow entries, and dispatch of deferred and timer functions. Even without adding more fields to the common flow structure, we can handle a number of these in a more flow-centric way. Unlike v1 this version is based on the hash table rework series. Changes since v3: * Rebased on current tree, plus patches for some test failures * Renamed "free blocks" to "free clusters" in flow allocation scheme * Fixed bug where we wouldn't properly merge adjacent free clusters together during deferred handling * Improved theory of operation comments about flow allocation scheme Changes since v2: * Realised the prealloc/commit functions where confusing and worked poorly for some future stuff. Replaced with alloc/alloc_cancel * Fixed a bug where newly allocated flow entries might not be 0-filled, because of the free tracking information in there. This could cause very subtle problems. Changes since v1: * Store the timestamp of last flow timers run in a global, rather than a ctx field * Rebased on the TCP hash table rework * Add patches 9..13/13 with changes to allocation and freeing of flow entries. David Gibson (13): flow: Make flow_table.h #include the protocol specific headers it needs treewide: Standardise on 'now' for current timestamp variables tcp, tcp_splice: Remove redundant handling from tcp_timer() tcp, tcp_splice: Move per-type cleanup logic into per-type helpers flow, tcp: Add flow-centric dispatch for deferred flow handling flow, tcp: Add handling for per-flow timers epoll: Better handling of number of epoll types tcp, tcp_splice: Avoid double layered dispatch for connected TCP sockets flow: Move flow_log_() to near top of flow.c flow: Move flow_count from context structure to a global flow: Abstract allocation of new flows with helper function flow: Enforce that freeing of closed flows must happen in deferred handlers flow: Avoid moving flow entries to compact table flow.c | 235 +++++++++++++++++++++++++++++++++++++++++++-------- flow.h | 5 +- flow_table.h | 20 +++++ icmp.c | 12 +-- icmp.h | 2 +- log.c | 34 ++++---- passt.c | 20 +++-- passt.h | 9 +- tcp.c | 143 +++++++++---------------------- tcp.h | 2 +- tcp_conn.h | 8 +- tcp_splice.c | 49 +++++------ tcp_splice.h | 4 +- udp.c | 16 ++-- udp.h | 2 +- 15 files changed, 339 insertions(+), 222 deletions(-) -- 2.43.0
flow_table.h, the lower level flow header relies on having the struct
definitions for every protocol specific flow type - so far that means
tcp_conn.h. It doesn't include it itself, so tcp_conn.h must be included
before flow_table.h.
That's ok for now, but as we use the flow table for more things,
flow_table.h will need the structs for all of them, which means the
protocol specific .c files would need to include tcp_conn.h _and_ the
equivalents for every other flow type before flow_table.h every time,
which is weird.
So, although we *mostly* lean towards the include style where .c files need
to handle the include dependencies, in this case it makes more sense to
have flow_table.h include all the protocol specific headers it needs.
Signed-off-by: David Gibson
In a number of places we pass around a struct timespec representing the
(more or less) current time. Sometimes we call it 'now', and sometimes we
call it 'ts'. Standardise on the more informative 'now'.
Signed-off-by: David Gibson
tcp_timer() scans the connection table, expiring "tap" connections and
calling tcp_splice_timer() for "splice" connections. tcp_splice_timer()
expires spliced connections and then does some other processing.
However, tcp_timer() is always called shortly after tcp_defer_handler()
(from post_handler()), which also scans the flow table expiring both tap
and spliced connections. So remove the redundant handling, and only do
the extra tcp_splice_timer() work from tcp_timer().
Signed-off-by: David Gibson
tcp_conn_destroy() and tcp_splice_destroy() are always called conditionally
on the connection being closed or closing. Move that logic into the
"destroy" functions themselves, renaming them tcp_flow_defer() and
tcp_splice_flow_defer().
Signed-off-by: David Gibson
tcp_defer_handler(), amongst other things, scans the flow table and does
some processing for each TCP connection. When we add other protocols to
the flow table, they're likely to want some similar scanning. It makes
more sense for cache friendliness to perform a single scan of the flow
table and dispatch to the protocol specific handlers, rather than having
each protocol separately scan the table.
To that end, add a new flow_defer_handler() handling all flow-linked
deferred operations.
Signed-off-by: David Gibson
tcp_timer() scans the flow table so that it can run tcp_splice_timer() on
each spliced connection. More generally, other flow types might want to
run similar timers in future.
We could add a flow_timer() analagous to tcp_timer(), udp_timer() etc.
However, this would need to scan the flow table, which we would have just
done in flow_defer_handler(). We'd prefer to just scan the flow table
once, dispatching both per-flow deferred events and per-flow timed events
if necessary.
So, extend flow_defer_handler() to do this. For now we use the same timer
interval for all flow types (1s). We can make that more flexible in future
if we need to.
Signed-off-by: David Gibson
As we already did for flow types, use an "EPOLL_NUM_TYPES" isntead of
EPOLL_TYPE_MAX, which is a little bit safer and clearer. Add a static
assert on the size of the matching names array.
Signed-off-by: David Gibson
Currently connected TCP sockets have the same epoll type, whether they're
for a "tap" connection or a spliced connection. This means that
tcp_sock_handler() has to do a secondary check on the type of the
connection to call the right function. We can avoid this by adding a new
epoll type and dispatching directly to the right thing.
Signed-off-by: David Gibson
flow_log_() is a very basic widely used function that many other functions
in flow.c will end up needing. At present it's below flow_table_compact()
which happens not to need it, but that's likely to change. Move it to
near the top of flow.c to avoid forward declarations.
Code motion only, no changes.
Signed-off-by: David Gibson
In general, the passt code is a bit haphazard about what's a true global
variable and what's in the quasi-global 'context structure'. The
flow_count field is one such example: it's in the context structure,
although it's really part of the same data structure as flowtab[], which
is a genuine global.
Move flow_count to be a regular global to match. For now it needs to be
public, rather than static, but we expect to be able to change that in
future.
Signed-off-by: David Gibson
Currently tcp.c open codes the process of allocating a new flow from the
flow table: twice, in fact, once for guest to host and once for host to
guest connections. This duplication isn't ideal and will get worse as we
add more protocols to the flow table. It also makes it harder to
experiment with different ways of handling flow table allocation.
Instead, introduce a function to allocate a new flow: flow_alloc(). In
some cases we currently check if we're able to allocate, but delay the
actual allocation. We now handle that slightly differently with a
flow_alloc_cancel() function to back out a recent allocation. We have that
separate from a flow_free() function, because future changes we have in
mind will need to handle this case a little differently.
Signed-off-by: David Gibson
Currently, flows are only evern finally freed (and the table compacted)
from the deferred handlers. Some future ways we want to optimise managing
the flow table will rely on this, so enforce it: rather than having the
TCP code directly call flow_table_compact(), add a boolean return value to
the per-flow deferred handlers. If true, this indicates that the flow
code itself should free the flow.
This forces all freeing of flows to occur during the flow code's scan of
the table in flow_defer_handler() which opens possibilities for future
optimisations.
Signed-off-by: David Gibson
Currently we always keep the flow table maximally compact: that is all the
active entries are contiguous at the start of the table. Doing this
sometimes requires moving an entry when one is freed. That's kind of
fiddly, and potentially expensive: it requires updating the hash table for
the new location, and depending on flow type, it may require EPOLL_CTL_MOD,
system calls to update epoll tags with the new location too.
Implement a new way of managing the flow table that doesn't ever move
entries. It attempts to maintain some compactness by always using the
first free slot for a new connection, and mitigates the effect of non
compactness by cheaply skipping over contiguous blocks of free entries.
See the "theory of operation" comment in flow.c for details.
Signed-off-by: David Gibson
On Tue, 16 Jan 2024 11:50:30 +1100
David Gibson
There are a number of things that are more-or-less general to flows which are still explicitly handled in tcp.c and tcp_splice.c including allocation and freeing of flow entries, and dispatch of deferred and timer functions.
Even without adding more fields to the common flow structure, we can handle a number of these in a more flow-centric way.
Unlike v1 this version is based on the hash table rework series.
Applied. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio