Re: [PATCH v3 0/7] tcp: Fixes for issues uncovered by tests with 6.17-rc1 kernels

2 Sep 2025


      On 01/09/2025 23:02, Stefano Brivio wrote:
...
On Mon, 1 Sep 2025 19:36:18 +0200
Paul Holzinger  wrote:
...
...
Hi,
On 29/08/2025 22:11, Stefano Brivio wrote:
...
Starting from Linux kernel commit 1d2fbaad7cd8 ("tcp: stronger
sk_rcvbuf checks"), window limits are enforced more aggressively with
a bigger amount of zero-window updates compared to what happened with
e2142825c120 ("net: tcp: send zero-window ACK when no memory") alone,
and occasional duplicate ACKs can now be seen also for local transfers
with default (208 KiB) socket buffer sizes.
Paul reports that, with 6.17-rc1-ish kernels, Podman tests for the
pasta integration occasionally fail on the "TCP/IPv4 large transfer,
tap" case.
While playing with a reproducer that seems to be matching those
failures:
   while true; do ./pasta --trace -l /tmp/pasta.log -p
/tmp/pasta.pcap --config-net -t 5555 -- socat TCP-LISTEN:5555
OPEN:/tmp/large.rcv,trunc & (sleep 0.3; socat -T2 OPEN:large.bin
TCP:88.198.0.164:5555; ); wait; diff large.bin /tmp/large.rcv ||
break; done
and a kernel including that commit, I hit a few different failures,
that should be fixed by this series.
Paul tested v1 of this series and found an additional failure
(transfer timeout), which I could reproduce with a slightly different
command:
   while true; do ./pasta --trace -l /tmp/pasta.log -p
/tmp/pasta.pcap --config-net -t 5555 -- socat TCP-LISTEN:5555
EXEC:./write.sh & (sleep 0.3; socat -T2 OPEN:large.bin
TCP:88.198.0.164:5555; ); wait; diff large.bin /tmp/large.rcv ||
break; done
where write.sh is simply:
   #!/bin/sh
       cat > /tmp/large.rcv
so that the connection is not half-closed starting from the beginning,
because socat can't make assumptions about the unidirectional nature
of the traffic. This should now be fixed as well by the new version of
patch 3/7.
v3:
    - add patch 6/7
    - in 7/7, check dlen <= 1 for keep-alive segments, instead of len
<= 1
v2: in 3/6, rewind sequence also if the zero-window update comes in
      the middle of a batch with non-zero window updates
Stefano Brivio (7):
    tcp: FIN flags have to be retransmitted as well
    tcp: Factor sequence rewind for retransmissions into a new function
    tcp: Rewind sequence when guest shrinks window to zero
    tcp: Fix closing logic for half-closed connections
    tcp: Don't try to transmit right after the peer shrank the window to
      zero
    tcp: Cast operands of sequence comparison macros to uint32_t before
      using them
    tcp: Fast re-transmit if half-closed, make TAP_FIN_RCVD path
      consistent
  tcp.c          | 181 ++++++++++++++++++++++++++++++++++---------------
   tcp_internal.h |  12 ++--
   2 files changed, 136 insertions(+), 57 deletions(-)
I am afraid I have to give bad news that it is still broken. My
reproducer failed after 70 mins (without logs) which means it took
longer this time but I only have one run so far so hard to tell. I can
enable logs again and see how long it takes then.
Ok, my logs reproducer is running for well over 7 hours now without
On 01/09/2025 12:02, Paul Holzinger wrote:
triggering the issue, so this series improves the situation a lot. I
keep trying but I think this is more than enough to convince me that
this here is good.
Tested-by: Paul Holzinger 
Thanks for testing and re-testing.
Just one question before I go ahead and merge this: how did the
original failure from earlier on Tuesday look like? Was that again a
timeout?
Yes from the podman test all failures looked the same so far, the podman 
logs --follow command times out because the container did not exit. 
Which happens because socat in the container did not exit as the tcp 
stream seems to be hanging/stay open.
Another thing worth trying: captures without logs, which should be much
less overhead (hence difference in timing).
I will try that then.
I should be able to figure out issues of this sort with captures and no
logs (it's much harder the other way around).
-- 
Paul Holzinger