On Wed, 29 Oct 2025 13:11:48 +0800
Yumei Huang
On Wed, Oct 29, 2025 at 12:38 PM Stefano Brivio
wrote: On Wed, 29 Oct 2025 11:06:44 +0800 Yumei Huang
wrote: On Tue, Oct 28, 2025 at 7:44 PM Stefano Brivio
wrote: On Tue, 28 Oct 2025 16:09:03 +0800 Yumei Huang
wrote: On Fri, Oct 24, 2025 at 7:04 AM Stefano Brivio
wrote: On Fri, 17 Oct 2025 14:28:38 +0800 Yumei Huang
wrote: > Use an exponential backoff timeout for data retransmission according > to RFC 2988 and RFC 6298. Set the initial RTO to one second as discussed > in Appendix A of RFC 6298. > > Also combine the macros defining the initial RTO for both SYN and ACK. > > Signed-off-by: Yumei Huang
> Reviewed-by: David Gibson > --- > tcp.c | 27 ++++++++++++--------------- > 1 file changed, 12 insertions(+), 15 deletions(-) > > diff --git a/tcp.c b/tcp.c > index 9385132..dc0ec6c 100644 > --- a/tcp.c > +++ b/tcp.c > @@ -179,16 +179,14 @@ > * > * Timeouts are implemented by means of timerfd timers, set based on flags: > * > - * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake > - * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend > - * SYN. It's the starting timeout for the first SYN retry. If this persists > - * for more than TCP_MAX_RETRIES or (tcp_syn_retries + > - * tcp_syn_linear_timeouts) times in a row, reset the connection > - * > - * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending > - * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the > - * socket and reset sequence to what was acknowledged. If this persists for > - * more than TCP_MAX_RETRIES times in a row, reset the connection > + * - RTO_INIT: if no ACK segment was received from tap/guest, either during > + * handshake (flag ACK_FROM_TAP_DUE without ESTABLISHED event) or after > + * sending data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data > + * from the socket and reset sequence to what was acknowledged. This is the > + * timeout for the first retry, in seconds. If this persists too many times > + * in a row, reset the connection: TCP_MAX_RETRIES for established > + * connections, or (tcp_syn_retries + tcp_syn_linear_timeouts) during the > + * handshake. > * > * - FIN_TIMEOUT: if a FIN segment was sent to tap/guest (flag ACK_FROM_TAP_DUE > * with TAP_FIN_SENT event), and no ACK is received within this time, reset > @@ -342,8 +340,7 @@ enum { > #define WINDOW_DEFAULT 14600 /* RFC 6928 */ > > #define ACK_INTERVAL 10 /* ms */ > -#define SYN_TIMEOUT_INIT 1 /* s */ > -#define ACK_TIMEOUT 2 > +#define RTO_INIT 1 /* s, RFC 6298 */ > #define FIN_TIMEOUT 60 > #define ACT_TIMEOUT 7200 > > @@ -588,13 +585,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) > } else if (conn->flags & ACK_FROM_TAP_DUE) { > if (!(conn->events & ESTABLISHED)) { > if (conn->retries < c->tcp.syn_linear_timeouts) > - it.it_value.tv_sec = SYN_TIMEOUT_INIT; > + it.it_value.tv_sec = RTO_INIT; > else > - it.it_value.tv_sec = SYN_TIMEOUT_INIT << > + it.it_value.tv_sec = RTO_INIT << > (conn->retries - c->tcp.syn_linear_timeouts); > } > else > - it.it_value.tv_sec = ACK_TIMEOUT; > + it.it_value.tv_sec = RTO_INIT << conn->retries; Same as on 3/4, but here it's clearly more convenient: just assign RTO_INIT, and multiply as needed in the if / else clauses.
I guess we can't just assign RTO_INIT. Maybe assign it only when retries==0, otherwise multiply as it.it_value.tv_sec <<=1.
Why can't you do that? Say:
it.it_value.tv_sec = RTO_INIT if (!(conn->events & ESTABLISHED)) { if (conn->retries >= c->tcp.syn_linear_timeouts) it.it_value.tv_sec <<= (conn->retries - c->tcp.syn_linear_timeouts);
but anyway, see below.
But it seems more complicated. What do you think?
Or maybe, building on my latest comment to 3/4:
int factor = conn->retries;
if (!(conn->events & ESTABLISHED)) factor -= c->tcp.syn_linear_timeouts;
it.it_value.tv_sec = RTO_INIT << MAX(factor, 0);
?
Yeah, I understand this part now.
> } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { > it.it_value.tv_sec = FIN_TIMEOUT; > } else {
The rest of the series looks good to me.
It might be slightly more practical to factor in directly the RTO clamp, and I don't think it's complicated now that you have the helper from 2/4, but it's not a strong preference from my side, as the series makes sense in any case.
Reading tcp_rto_max_ms can be easy with the helper. My concern is about the way we get the total time for retries.
I used to do it like this in v2, https://archives.passt.top/passt-dev/20251010074700.22177-4-yuhuang@redhat.c...:
+#define RETRY_ELAPSED(timeout_init, retries) \ + ((timeout_init) * ((1 << ((retries) + 1)) - 2))
Though the formula is not quite right, we could refine it as below:
#define RETRY_ELAPSED(retries) ((RTO_INIT) * ((1 << ((retries) + 1)) - 1))
Does it make sense to get the time this way?
Well, it also depends on c->tcp.syn_linear_timeouts, right?
Not really, it's only used for data retransmission, so syn_linear_timeouts is not relevant.
Hmm, no, why? RFC 6298 covers SYN retries as well, and that's the one stating:
I meant RETRY_ELAPSED was only used for data retransmission, which uses exponential backoff timeout directly, so syn_linear_timeouts was not relevant.
Ah, okay.
(2.5) A maximum value MAY be placed on RTO provided it is at least 60 seconds.
For SYN retries, as we used linear backoff + exponential backoff, and also limited by TCP_MAX_RETRIES, the possible max RTO is far less than 60s. So we didn't clamp it. Do you think we need to clamp it as well?
If syn_linear_timeouts is 0 and tcp_syn_retries is 7, I guess we'll reach 2^0 + 2^1 + 2^2 + 2^3 + 2^4 + 2^5 + 2^6 + 2^7 = 247 seconds? Or just up to ... + 2^6, that is, 119 seconds? In any case, what is the difference compared to data retransmissions? Don't we have 3 bits to store the retry count as well, so we're limited by TCP_MAX_RETRIES anyway? Looking at patch 1/4 I'd say it's the same counter.
...the only thing that I don't see implemented in this version of the patch is paragraph 5.7:
(5.7) If the timer expires awaiting the ACK of a SYN segment and the TCP implementation is using an RTO less than 3 seconds, the RTO MUST be re-initialized to 3 seconds when data transmission begins (i.e., after the three-way handshake completes).
I missed that while reviewing earlier versions. I guess we need to use a MAX(x, 3) clamp if (c->conn->events & ESTABLISHED). I think it's simpler than re-introducing separate starting values (one second and three seconds).
I'm not sure I understand here. If the timer expires, didn't we reset the connection directly? We would never get to the data transmission phase?
In the RFC, "the timer expires" indicates one iteration of the timeout algorithm, that is, it's the same as our timer (tcp_timer_handler()) triggering. The paragraph says "after the three-way handshake completes", so it looks like the connection wasn't reset.
Probably I should name it more clearly.
But in any case, do you really need to calculate that explicitly? I was thinking that you can just clamp the value when you use it in tcp_timer_ctl().
If c->tcp.rto_max is DIV_ROUND_CLOSEST(x, 1000), where 'x' is the value you read from the kernel, then I guess it's just:
--- it.it_value.tv_sec = MIN(it.it_value.tv_sec, c->tcp.rto_max);
After reading the comments in v3 when tcp_rto_max_ms was first mentioned again, I realized I got something wrong again. I thought it was for the total timeout for all retries, so I need to calculate that and decide to reset the connection as in v2.
I think it actually applies to all the retries.
Anyway, you are right. We don't need to do that. Thanks for your patience.
} ...
if (timerfd_settime(conn->timer, 0, &it, NULL)) flow_perror(conn, "failed to set timer"); ---
-- Stefano