On 01/09/2025 23:02, Stefano Brivio wrote:
On Mon, 1 Sep 2025 19:36:18 +0200 Paul Holzinger
wrote: Hi,
On 29/08/2025 22:11, Stefano Brivio wrote:
Starting from Linux kernel commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks"), window limits are enforced more aggressively with a bigger amount of zero-window updates compared to what happened with e2142825c120 ("net: tcp: send zero-window ACK when no memory") alone, and occasional duplicate ACKs can now be seen also for local transfers with default (208 KiB) socket buffer sizes.
Paul reports that, with 6.17-rc1-ish kernels, Podman tests for the pasta integration occasionally fail on the "TCP/IPv4 large transfer, tap" case.
While playing with a reproducer that seems to be matching those failures:
while true; do ./pasta --trace -l /tmp/pasta.log -p /tmp/pasta.pcap --config-net -t 5555 -- socat TCP-LISTEN:5555 OPEN:/tmp/large.rcv,trunc & (sleep 0.3; socat -T2 OPEN:large.bin TCP:88.198.0.164:5555; ); wait; diff large.bin /tmp/large.rcv || break; done
and a kernel including that commit, I hit a few different failures, that should be fixed by this series.
Paul tested v1 of this series and found an additional failure (transfer timeout), which I could reproduce with a slightly different command:
while true; do ./pasta --trace -l /tmp/pasta.log -p /tmp/pasta.pcap --config-net -t 5555 -- socat TCP-LISTEN:5555 EXEC:./write.sh & (sleep 0.3; socat -T2 OPEN:large.bin TCP:88.198.0.164:5555; ); wait; diff large.bin /tmp/large.rcv || break; done
where write.sh is simply:
#!/bin/sh cat > /tmp/large.rcv
so that the connection is not half-closed starting from the beginning, because socat can't make assumptions about the unidirectional nature of the traffic. This should now be fixed as well by the new version of patch 3/7.
v3: - add patch 6/7 - in 7/7, check dlen <= 1 for keep-alive segments, instead of len <= 1
v2: in 3/6, rewind sequence also if the zero-window update comes in the middle of a batch with non-zero window updates
Stefano Brivio (7): tcp: FIN flags have to be retransmitted as well tcp: Factor sequence rewind for retransmissions into a new function tcp: Rewind sequence when guest shrinks window to zero tcp: Fix closing logic for half-closed connections tcp: Don't try to transmit right after the peer shrank the window to zero tcp: Cast operands of sequence comparison macros to uint32_t before using them tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path consistent
tcp.c | 181 ++++++++++++++++++++++++++++++++++--------------- tcp_internal.h | 12 ++-- 2 files changed, 136 insertions(+), 57 deletions(-) I am afraid I have to give bad news that it is still broken. My reproducer failed after 70 mins (without logs) which means it took longer this time but I only have one run so far so hard to tell. I can enable logs again and see how long it takes then. Ok, my logs reproducer is running for well over 7 hours now without
On 01/09/2025 12:02, Paul Holzinger wrote: triggering the issue, so this series improves the situation a lot. I keep trying but I think this is more than enough to convince me that this here is good.
Tested-by: Paul Holzinger
Thanks for testing and re-testing. Just one question before I go ahead and merge this: how did the original failure from earlier on Tuesday look like? Was that again a timeout? Yes from the podman test all failures looked the same so far, the podman logs --follow command times out because the container did not exit. Which happens because socat in the container did not exit as the tcp stream seems to be hanging/stay open.
Another thing worth trying: captures without logs, which should be much less overhead (hence difference in timing). I will try that then.
I should be able to figure out issues of this sort with captures and no logs (it's much harder the other way around).
-- Paul Holzinger