Re: [PATCH v3 13/13] flow: Avoid moving flow entries to compact table

5 Jan 2024

      On Fri, Jan 05, 2024 at 09:33:35AM +0100, Stefano Brivio wrote:
...
On Thu, 4 Jan 2024 21:02:19 +1100
David Gibson  wrote:
...
On Tue, Jan 02, 2024 at 07:13:41PM +0100, Stefano Brivio wrote:
...
On Mon, 1 Jan 2024 23:01:17 +1100
David Gibson  wrote:
...
On Sat, Dec 30, 2023 at 11:33:04AM +0100, Stefano Brivio wrote:
...
On Thu, 28 Dec 2023 19:25:25 +0100
Stefano Brivio  wrote:
...
> On Thu, 21 Dec 2023 17:15:49 +1100
> David Gibson  wrote:
> 
> [...]
[...]
I wonder if we really have to keep track of the number of (non-)entries
in the free "block", and if we have to be explicit about the two cases.
I'm trying to find out if we can simplify the whole thing with slightly
different variants, for example:
So... I think the version with (explicit) blocks has this fundamental
advantage, on deletion:
...
> +	flow->f.type = FLOW_TYPE_NONE;
> +	/* Put it back in a length 1 free block, don't attempt to fully reverse
> +	 * flow_alloc()s steps.  This will get folded together the next time
> +	 * flow_defer_handler runs anyway() */
> +	flow->free.n = 1;
> +	flow->free.next = flow_first_free;
> +	flow_first_free = FLOW_IDX(flow);
which is doable even without explicit blocks, but much harder to
follow.
Remember this is not a general deletion, only a "cancel" of the most
recent allocation.
Oh, I thought that was only the case for this series and you would use
that as actual deletion in another pending series (which I haven't
finished reviewing yet).
No.  Not allowing deletion of any entry at any time is what I'm
trading off to get both O(1) allocation and (effectively) O(1)
deletion.
...
But now I'm not sure anymore why I was thinking this...
Anyway... do we really need it, then? Can't we just mark the "failed"
flows as whatever means "closed" for a specific protocol, and clean
them up later, instead of calling cancel() right away?
We could, but I'm not sure we want to.  For starters, that requires
protocol-specific behaviour whenever we need to back out an allocation
like this.  Not a big deal, since that's in protocol specific code
already, but I think it's uglier than calling cancel.
It also requires that the protocol specific deferred cleanup functions
(e.g. tcp_flow_defer()) handle partially initialised entries.  With
'cancel' we can back out just the initialisation steps we've already
done (because we know where we've failed during init), then remove the
entry.  The deferred cleanup function only needs to deal with
"complete" entries.  Again, certainly possible, but IMO uglier than
having 'cancel'.
Okay, yes, I see now.
Another doubt that comes to me now is: if you don't plan to use this
alloc_cancel() thing anywhere else, the only reason why you are adding
it is to replace the (flow_count >= FLOW_MAX) check with a flow_alloc()
version that can fail.
But at this point, speaking of ugliness, couldn't we just have a
bool flow_may_alloc() { return flow_first_free < FLOW_MAX }; the caller
can use to decide to abort earlier? To me it looks so much simpler and
more robust.
Well, we could, but there are a couple of reasons I don't love it.
The first is abstraction: this returns explicit handling of the layout
of the table to the protocol specific callers.  It's not a huge deal
right now, but once we have 4 or 5 protocols doing this, having to
change all of them if we make any tiny change to the semantics of
flow_first_free isn't great.

The other issue is that to do this (without a bunch of fairly large
and ugly temporaries) means we'd populate at least some of the fields
in flow_common before we have officially "allocated" the entry.  At
that point it becomes a bit fuzzy as to when that allocation really
occurs.  Is it when we do the FLOW_MAX tesT?  Is it when we write to
f.type?  Is it when we update flow_first_free?  If we fail somewhere
in the middle of that, what steps do we need to reverse?

For those reasons I prefer the scheme presented.  Fwiw, in an earlier
draft I did this differently with a "flow_prealloc()", which was
essentially the check against FLOW_MAX, then a later
flow_alloc_commit().  I thought it turned out pretty confusing
compared to the alloc/cancel approach.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson