Re: [PATCH 2/3] tcp: Implement hash table with indices rather than pointers

7 Dec 2023

On Thu, 7 Dec 2023 12:04:18 +1100
David Gibson  wrote:
...
On Wed, Dec 06, 2023 at 08:37:27PM +0100, Stefano Brivio wrote:
...
On Mon,  4 Dec 2023 14:16:10 +1100
David Gibson  wrote:
...
We implement our hash table with pointers to the entry for each bucket (or
NULL).  However, the entries are always allocated within the flow table,
meaning that a flow index will suffice, halving the size of the hash table.
For TCP, just a flow index would be enough, but future uses will want to
expand the hash table to cover indexing either side of a flow, so use a
flow_sidx_t as the type for each hash bucket.
Signed-off-by: David Gibson 
---
 flow.h | 11 +++++++++++
 tcp.c  | 34 +++++++++++++++++++++++-----------
 2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/flow.h b/flow.h
index c2a5190..959b461 100644
--- a/flow.h
+++ b/flow.h
@@ -53,6 +53,17 @@ static_assert(sizeof(flow_sidx_t) <= sizeof(uint32_t),
#define FLOW_SIDX_NONE ((flow_sidx_t){ .flow = FLOW_MAX })
In hindsight, while reviewing the functions below: FLOW_MAX should
probably be MAX_FROM_BITS(FLOW_INDEX_BITS) - 1 instead (and those >=
comparisons would happily become >), so that we don't need to have a
"maximum" value that's also not allowed (then, strictly speaking, it's
more than the maximum).
Right, either that or name the variable MAX_NUM_FLOWS or something.
Eh, whatever.
...
...
+/**
+ * flow_sidx_eq() - Test if two sidx values are equal
+ * @a, @b:	sidx values
+ *
+ * Return: true iff @a and @b refer to the same side of the same flow
+ */
+static inline bool flow_sidx_eq(flow_sidx_t a, flow_sidx_t b)
+{
+	return (a.flow == b.flow) && (a.side == b.side);
+}
+
 union flow;
void flow_table_compact(struct ctx *c, union flow *hole);
diff --git a/tcp.c b/tcp.c
index 09acf7f..7e438b7 100644
--- a/tcp.c
+++ b/tcp.c
@@ -574,7 +574,7 @@ static unsigned int tcp6_l2_flags_buf_used;
 #define CONN(idx)		(&(FLOW(idx)->tcp))
/* Table for lookup from remote address, local port, remote port */
-static struct tcp_tap_conn *tc_hash[TCP_HASH_TABLE_SIZE];
+static flow_sidx_t tc_hash[TCP_HASH_TABLE_SIZE];
static_assert(ARRAY_SIZE(tc_hash) >= FLOW_MAX,
  "Safe linear probing requires hash table larger than connection table");
@@ -1197,10 +1197,13 @@ static unsigned int tcp_conn_hash(const struct ctx *c,
 static inline unsigned tcp_hash_probe(const struct ctx *c,
      		      const struct tcp_tap_conn *conn)
 {
+	flow_sidx_t sidx = FLOW_SIDX(conn, TAPSIDE);
  unsigned b;
/* Linear probing */
-	for (b = tcp_conn_hash(c, conn); tc_hash[b] && tc_hash[b] != conn;
+	for (b = tcp_conn_hash(c, conn);
+	     !flow_sidx_eq(tc_hash[b], FLOW_SIDX_NONE) &&
Do we actually need to check for FLOW_SIDX_NONE explicitly? That is,
sidx we get as input here should never be FLOW_SIDX_NONE.
Yes: we need to stop when we reach something matching @sidx *or* we
hit an empty entry.  Otherwise we'll never terminate if the entry
isn't in there.
Ah, right, sorry, for a moment I read this as:

	!flow_sidx_eq(tc_hash[b], FLOW_SIDX_NONE) &&
	flow_sidx_eq(tc_hash[b], sidx);

where sidx != FLOW_SIDX_NONE would have the first comparison redundant.
But it's not the case, of course.

-- 
Stefano