Re: [PATCH v3 4/4] tcp: Update data retransmission timeout

29 Oct 2025


      On Wed, 29 Oct 2025 11:35:29 +1100
David Gibson  wrote:
...
On Wed, Oct 29, 2025 at 12:13:30AM +0100, Stefano Brivio wrote:
...
On Mon, 20 Oct 2025 20:17:10 +1100
David Gibson  wrote:
...
On Mon, Oct 20, 2025 at 07:11:07AM +0200, Stefano Brivio wrote:
...
On Mon, 20 Oct 2025 11:20:19 +1100
David Gibson  wrote:  
[snip]
...
...
Rather than the local link I was thinking of whatever monitor or
liveness probe in KubeVirt which might have a 60-second period, or some
firewall agent, or how long it typically takes for guests to stop and
resume again in KubeVirt.
Right, I hadn't considered those.  Although.. do those actually re-use
a single connection?  I would have guessed they use a new connection
each time, making the timeouts here irrelevant.
It depends on the definition of "each time", because we don't time out
host-side connections immediately.
Hm, ok.  Is your concern that getting a negative answer from the probe
will take too long?
More like getting a positive answer taking too long, because we retry
so infrequently.
Right, but it will only be slow if we lose the first probe, which
should be very rare.
No, because again, that might be due to the guest doing something with
its firewall or stopping/resuming/getting online etc. It's not
necessarily rare.

If that situation persists for at least 1 + 2 + 4 + 8 + 16 + 32 = 55
seconds, without a clamp, we'll wait 119 seconds next, and 247 seconds
after that. In this case, to me, it looks more reasonable to retry
every minute instead.
...
...
...
...
Pretending passt isn't there, the timeout would come from the default
values for TCP connections. It looks like there's no specific
SO_SNDTIMEO value set for those probes, and you can't configure the
timeout, at least according to:
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-...
My guess would be that the probe would probably time out at the
application level long before the TCP layer times out, but I don't
know for sure.
I don't think so. What I was pointing out is that I couldn't find any
place in the implementation of those probes where a particular
*handshake timeout* (not probe interval) is set on top of Linux's
defaults, so timeouts at TCP layer and application level should be the
same (no additional timeout in application logic).
Huh, that's mildly surprising to me.
-- 
Stefano