[PATCH v2 00/10] siphash: cleanups and fixes
While working on unifying the hashing for the flow table, I noticed some awkwardness in the siphash functions. While looking into that I noticed some bugs. So.. here we are. Changes since v1: * Don't accidentally increase the alignment of union inany_addr David Gibson (10): siphash: Make siphash functions consistently return 64-bit results siphash: Make sip round calculations an inline function rather than macro siphash: Add siphash_feed() helper siphash: Clean up hash finalisation with posthash_final() function siphash: Fix bug in state initialisation siphash: Use more hygienic state initialiser siphash: Use specific structure for internal state siphash: Make internal helpers public siphash, checksum: Move TBAA explanation to checksum.c siphash: Use incremental rather than all-at-once siphash functions Makefile | 2 +- checksum.c | 19 ++-- inany.h | 17 +++- siphash.c | 243 --------------------------------------------------- siphash.h | 119 +++++++++++++++++++++++-- tcp.c | 37 +++----- tcp_splice.c | 1 + 7 files changed, 159 insertions(+), 279 deletions(-) delete mode 100644 siphash.c -- 2.41.0
Some of the siphas_*b() functions return 64-bit results, others 32-bit
results, with no obvious pattern. siphash_32b() also appears to do this
incorrectly - taking the 64-bit hash value and simply returning it
truncated, rather than folding the two halves together.
Since SipHash proper is defined to give a 64-bit hash, make all of them
return 64-bit results. In the one caller which needs a 32-bit value,
tcp_seq_init() do the fold down to 32-bits ourselves.
Signed-off-by: David Gibson
The SIPROUND(n) macro implements n rounds of SipHash shuffling. It relies
on 'v' and '__i' variables being available in the context it's used in
which isn't great hygeine. Replace it with an inline function instead.
Signed-off-by: David Gibson
We have macros or inlines for a number of common operations in the siphash
functions. However, in a number of places we still open code feeding
another 64-bits of data into the hash function: an xor, followed by 2
rounds of shuffling, followed by another xor.
Implement an inline function for this, which results in somewhat shortened
code.
Signed-off-by: David Gibson
The POSTAMBLE macro implements the finalisation steps of SipHash. It
relies on some variables in the environment, including returning the final
hash value that way. That isn't great hygeine.
In addition the PREAMBLE macro takes a length parameter which is used only
to initialize the 'b' value that's not used until the finalisation and is
also sometimes modified in a non-obvious way by the callers.
The 'b' value is always composed from the total length of the hash input
plus up to 7 bytes of "tail" data - that is the remainder of the hash input
after a multiple of 8 bytes has been consumed.
Simplify all this by replacing the POSTAMBLE macro with a siphash_final()
function which takes the length and tail data as parameters and returns the
final hash value.
Signed-off-by: David Gibson
The SipHash algorithm starts with initializing the 32 bytes of internal
state with some magic numbers XORed with the hash key. However, our
implementation has a bug - rather than XORing the hash key, it *sets* the
initial state to copies of the key.
I don't know if that affects any of the cryptographic properties of SipHash
but it's not what we should be doing. Fix it.
Signed-off-by: David Gibson
The PREAMBLE macro sets up the SipHash initial internal state. It also
defines that state as a variable, which isn't macro hygeinic. With
previous changes simplifying this premable, it's now possible to replace it
with a macro which simply expands to a C initialisedrfor that state.
Signed-off-by: David Gibson
To improve type safety, encapsulate the internal state of the SipHash
algorithm into a dedicated structure type.
Signed-off-by: David Gibson
Move a bunch of code from siphash.c to siphash.h, making it available to
other modules. This will allow places which need hashes of more complex
objects to construct them incrementally.
Signed-off-by: David Gibson
A number of checksum and hash functions require workarounds for the odd
behaviour of Type-Baased Alias Analysis. We have a detailed comment about
this on siphash_8b() and other functions reference that.
Move the main comment to csume_16b() instead, because we're going to
reorganise things in siphash.c.
Signed-off-by: David Gibson
We have a bunch of variants of the siphash functions for different data
sizes. The callers, in tcp.c, need to pack the various values they want to
hash into a temporary structure, then call the appropriate version. We can
avoid the copy into the temporary by directly using the incremental
siphash functions.
The length specific hash functions also have an undocumented constraint
that the data pointer they take must, in fact, be aligned to avoid
unaligned accesses, which may cause crashes on some architectures.
So, prefer the incremental approach and remove the length-specific
functions.
Signed-off-by: David Gibson
On Thu, 28 Sep 2023 11:20:52 +1000
David Gibson
While working on unifying the hashing for the flow table, I noticed some awkwardness in the siphash functions. While looking into that I noticed some bugs. So.. here we are.
Changes since v1: * Don't accidentally increase the alignment of union inany_addr
David Gibson (10): siphash: Make siphash functions consistently return 64-bit results siphash: Make sip round calculations an inline function rather than macro siphash: Add siphash_feed() helper siphash: Clean up hash finalisation with posthash_final() function siphash: Fix bug in state initialisation siphash: Use more hygienic state initialiser siphash: Use specific structure for internal state siphash: Make internal helpers public siphash, checksum: Move TBAA explanation to checksum.c siphash: Use incremental rather than all-at-once siphash functions
Applied, thanks. -- Stefano
participants (2)
-
David Gibson
-
Stefano Brivio