On Fri, 11 Apr 2025 14:54:50 +1000
David Gibson
Hi Stefano,
When debugging the splice EINTR bug I fixed the other day, I found the whole tcp_splice_sock_handler() pretty confusing to follow. So, I was working on some cleanups. But then I noticed something more specifically odd here.
We've discussed the use of SO_RCVLOWAT previously. AIUI, you found it essential to achieve reasonable throughput and load for spliced connections.
From my tests back then (never on what I ended up committing, it seems) it wasn't needed in general, it helped only with bulk transfers that never feel the pipe for some reason. With iperf3, I needed to play with parameters quite a bit to reproduce something like that. You would need (at least) to disable Nagle's algorithm (-N) and send small messages (say, -l 4k instead of -l 1M).
I think we've agreed before that it's not entirely the right tool for the job; just the only one available.
Except... as far as I can tell, it's never invoked. AFAICT the only place we enable the RCVLOWAT stuff is in a block under this if:
if (!(conn->flags & lowat_set_flag) && readlen > (long)c->tcp.pipe_size / 10) {
But... this occurs immediately after: if (readlen >= (long)c->tcp.pipe_size * 10 / 100) continue;
.. which is a strictly more inclusive condition, so we'll never reach the RCVLOWAT block.
Right, yes, I think we noticed a while ago a bit after trying to restore the functionality with 01b6a164d94f ("tcp_splice: A typo three years ago and SO_RCVLOWAT is gone"). That wasn't sufficient.
To confirm, I tried putting an ASSERT(0) in that block, and didn't hit it with spliced iperf3 runs.
Am I missing something?
No, not really, and that error has been there forever, since I "added" (not really) the feature in 904b86ade7db ("tcp: Rework window handling, timers, add SO_RCVLOWAT and pools for sockets/pipes"). My intention was actually: if (read >= (long)c->tcp.pipe_size * 90 / 100) continue; or something like that, perhaps 50% even. The idea behind it was: if we already have good pipe utilisation, there's no need for SO_RCVLOWAT, and we should retry calling splice() right away. But at this point we should try again with iperf3 and smaller messages. Perhaps even limiting the throughput (-b) with multiple flows... -- Stefano