On 09/09/2025 20:16, Stefano Brivio wrote:
Starting from Linux kernel commit 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks"), window limits are enforced more aggressively with a bigger amount of zero-window updates compared to what happened with e2142825c120 ("net: tcp: send zero-window ACK when no memory") alone, and occasional duplicate ACKs can now be seen also for local transfers with default (208 KiB) socket buffer sizes.
Paul reports that, with 6.17-rc1-ish kernels, Podman tests for the pasta integration occasionally fail on the "TCP/IPv4 large transfer, tap" case.
While playing with a reproducer that seems to be matching those failures:
while true; do ./pasta --trace -l /tmp/pasta.log -p /tmp/pasta.pcap --config-net -t 5555 -- socat TCP-LISTEN:5555 OPEN:/tmp/large.rcv,trunc & (sleep 0.3; socat -T2 OPEN:large.bin TCP:88.198.0.164:5555; ); wait; diff large.bin /tmp/large.rcv || break; done
and a kernel including that commit, I hit a few different failures, that should be fixed by this series.
Paul tested v1 of this series and found an additional failure (transfer timeout), which I could reproduce with a slightly different command:
while true; do ./pasta --trace -l /tmp/pasta.log -p /tmp/pasta.pcap --config-net -t 5555 -- socat TCP-LISTEN:5555 EXEC:./write.sh & (sleep 0.3; socat -T2 OPEN:large.bin TCP:88.198.0.164:5555; ); wait; diff large.bin /tmp/large.rcv || break; done
where write.sh is simply:
#!/bin/sh
cat > /tmp/large.rcv
so that the connection is not half-closed starting from the beginning, because socat can't make assumptions about the unidirectional nature of the traffic. This should now be fixed as well by the new version of patch 3/8.
v4: - add patch 8/8
v3: - add patch 6/7 - in 7/7, check dlen <= 1 for keep-alive segments, instead of len <= 1
v2: in 3/6, rewind sequence also if the zero-window update comes in the middle of a batch with non-zero window updates
Stefano Brivio (8): tcp: FIN flags have to be retransmitted as well tcp: Factor sequence rewind for retransmissions into a new function tcp: Rewind sequence when guest shrinks window to zero tcp: Fix closing logic for half-closed connections tcp: Don't try to transmit right after the peer shrank the window to zero tcp: Cast operands of sequence comparison macros to uint32_t before using them tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path consistent tcp: Don't send FIN segment to guest yet if we have pending unacknowledged data
tcp.c | 142 ++++++++++++++++++++++++++++++++++++++----------- tcp_buf.c | 5 +- tcp_internal.h | 12 +++-- 3 files changed, 122 insertions(+), 37 deletions(-)
Reproducer runs for over 14 hours now without failure so looks like we
found all the problems now.
Tested-by: Paul Holzinger