On Fri, Oct 24, 2025 at 10:37:17AM +0200, Stefano Brivio wrote:
On Fri, 24 Oct 2025 14:30:09 +1100 David Gibson
wrote: On Fri, Oct 24, 2025 at 01:04:31AM +0200, Stefano Brivio wrote:
On Fri, 17 Oct 2025 14:28:37 +0800 Yumei Huang
wrote: [snip] @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++;
I think I already raised this point on a previous revision: this needs to be zeroed as the connection is established, but I don't see that in the current version.
Yes, you raised that, but then I realised it's already handled. I think I put that in the thread, not just direct to Yumei, but maybe not? Or it just got lost in the minutiae.
Yes, here:
https://archives.passt.top/passt-dev/aOxFRfJjPWy0ZW0M@zatzit
this is another example of what I meant about (potential) advantages of a fully threaded (email) workflow.
In this case, I didn't review v2, which came before you could post this to my comment on v1, but in a normal case, we could have settled this earlier, once for all.
Ah, right, that'd do it.
When we receive a SYN-ACK, it will have th->ack_seq advanced a byte acknowledging the SYN. tcp_tap_handler() calls tcp_update_seqack_from_tap() in the !ESTABLISHED case which will see the new ack_seq and clear retries (retrans before this series).
It doesn't look obvious at all to me.
Oh, it's definitely not obvious, but I'm pretty confident it's correct. Fwiw, I spotted this because I thought the explicit handling in v2 wasn't at quite the right point logically (though close enough to be fine in practice). I went looking for the precise right point - when we receive the SYN-ACK - and there it was, already handled. It does make a kind of logical sense. The RFCs don't generally treat SYN (or SYN-ACK, or FIN) retransmits any differently from data retransmits. We do treat them differently, but less so after this series, which is a good thing, I think.
We're unlikely to break it in the future, so I don't think it's fragile in the long term, but... can one of you double check that it's actually the case with a manual one-off test?
Yeah, I guess that's wise. Easiest way is probably to add a temporary debug message here, and try it against a qemu guest that's temporarily suspended. Yumei, I can walk you through this, too. -- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson