Patch 1/8 is the most relevant fix here, as we currently advertise a window that might be too big for what we can write to the socket, causing retransmissions right away and occasional high latency on short transfers to non-local peers. Mostly as a consequence of fixing that, we now need several improvements and small fixes, including, most notably, an adaptive approach to pick the interval between checks for socket-side ACKs (patch 2/8), and several tricks to reliably trigger TCP buffer size auto-tuning as implemented by the Linux kernel (patches 4/8 and 6/8). These changes make some existing issues more relevant, fixed by the other patches. With this series, I'm getting the expected (wirespeed) throughput for transfers between peers with varying non-local RTTs: I checked different guests bridged on the same machine (~600 us) and hosts with increasing distance (approximately 100 to 600 km, ~4 to ~35 ms), using iperf3 as well as HTTP transfers. For short transfers, we strictly stick to the available sending buffer size to (almost) make sure we avoid local retransmissions, and significantly decrease transfer time as a result: from 1.2 s to 60 ms for a 5 MB HTTP transfer from a container hosted in a virtual machine to another guest. Stefano Brivio (8): tcp: Limit advertised window to available, not total sending buffer size tcp: Adaptive interval based on RTT for socket-side acknowledgement checks tcp: Don't clear ACK_TO_TAP_DUE if we're advertising a zero-sized window tcp: Acknowledge everything if sending buffer is less than SNDBUF_BIG tcp: Don't limit window to less-than-MSS values, use zero instead tcp: Allow exceeding the available sending buffer size in window advertisements tcp: Send a duplicate ACK also on complete sendmsg() failure tcp: Skip redundant ACK on partial sendmsg() failure README.md | 2 +- tcp.c | 85 ++++++++++++++++++++++++++++++++++++++++++------------ tcp_conn.h | 9 ++++++ util.c | 14 +++++++++ util.h | 1 + 5 files changed, 92 insertions(+), 19 deletions(-) -- 2.43.0