On Tue, 28 Oct 2025 15:43:25 +0800
Yumei Huang
On Fri, Oct 24, 2025 at 7:04 AM Stefano Brivio
wrote: On Fri, 17 Oct 2025 14:28:37 +0800 Yumei Huang
wrote: If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Linux.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
--- tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- tcp.h | 5 +++++ 2 files changed, 52 insertions(+), 8 deletions(-) diff --git a/tcp.c b/tcp.c index 2ec4b0c..9385132 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend + * SYN. It's the starting timeout for the first SYN retry. If this persists
"If this persists" makes sense for the existing ACK_TIMEOUT description but not here, because it looks like it refers to "starting timeout".
Coupled with the next patch, it becomes increasingly difficult to understand what "this" persisting thing is.
Maybe directly say "Retry for ..., then reset the connection"? It's shorter and clearer.
+ * for more than TCP_MAX_RETRIES or (tcp_syn_retries + + * tcp_syn_linear_timeouts) times in a row, reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s */
Maybe mention RFC 6928 as done above? That's where this value comes from.
I just noticed you do that in 4/4, so it's slightly nicer if you do that right away here for ease of future reference, but not really needed.
#define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \
This uses 121 columns. I'm not sure where all those tabs and \ come from.
My bad. I renamed the macro and it fits in 80 columns but I forgot to remove the '\' and the spaces before that.
+ /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; + if (!(conn->events & ESTABLISHED)) { + if (conn->retries < c->tcp.syn_linear_timeouts) + it.it_value.tv_sec = SYN_TIMEOUT_INIT; + else + it.it_value.tv_sec = SYN_TIMEOUT_INIT << + (conn->retries - c->tcp.syn_linear_timeouts);
Probably more readable, but I haven't tried: always start from SYN_TIMEOUT_INIT, then multiply/shift if conn->retries >= c->tcp.syn_linear_timeouts.
I guess you meant:
if (!(conn->events & ESTABLISHED)) { it.it_value.tv_sec = SYN_TIMEOUT_INIT; if (conn->retries >= c->tcp.syn_linear_timeouts) it.it_value.tv_sec = SYN_TIMEOUT_INIT << (conn->retries - c->tcp.syn_linear_timeouts);
Well, yes, this, but without repeating the assignment, so <<= (conn->retries - c->tcp.syn_linear_timeouts).
other than something like:
it.it_value.tv_sec <<= 1
Hmm, no, why 1?
in above second if block, since the latter would cause it.it_value.tv_sec=2 always when retries>=syn_linear_timeouts, as it.it_value.tv_sec is set to SYN_TIMEOUT_INIT before the if condition.
If I'm correct, I'm not sure if it's more readable to change to that. I feel the if/else clause quite matches the way the kernel handles it. What do you think?
Sorry, I took this for granted, but I rather meant: it.it_value.tv_sec = SYN_TIMEOUT_INIT; if (conn->retries >= c->tcp.syn_linear_timeouts) { it.it_value.tv_sec <<= (conn->retries - c->tcp.syn_linear_timeouts); } because in both cases we're starting from SYN_TIMEOUT_INIT. And now that I wrote that, I'll pretend I actually meant this: if (!(conn->events & ESTABLISHED)) { int factor = conn->retries - c->tcp.syn_linear_timeouts; it.it_value.tv_sec = SYN_TIMEOUT_INIT << MAX(factor, 0); } else { ... } ...where we make the conditional implicit (but I think intuitive) by adding 0 as a lower bound. Am I missing something...?
+ } else it.it_value.tv_sec = ACK_TIMEOUT; } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++;
I think I already raised this point on a previous revision: this needs to be zeroed as the connection is established, but I don't see that in the current version.
+ tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_syn_params_init() - Get initial SYN parameters for inbound connection
They're not initial, they'll be used for all the connections if I understand correctly.
Maybe "Get SYN retries sysctl values"? I think the _init() in the function name is also somewhat misleading.
Do you have a better idea for the function name? I named it just like other functions called in tcp_init(), such as tcp_sock_iov_init(), tcp_splice_init().
Given that David was suggesting "Get host kernel RTO parameters" as a comment to this function, maybe tcp_get_rto_params()? The only similar function we have is tcp_get_sndbuf(), so I would make it consistent with that. The most peculiar task of this function is that it gets parameters ("get"), I think, rather than assigning them to variables ("init").
+ * @c: Execution context +*/ +void tcp_syn_params_init(struct ctx *c) +{ + intmax_t tcp_syn_retries, syn_linear_timeouts; + + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8);
Why 8? Perhaps a #define would help?
The 8 actually comes from your suggestion in v1: " (optional, and might be in another series, but keeping it together with the rest might be more convenient): read tcp_syn_retries, limit to 8, and also read tcp_syn_linear_timeouts "
That was a *limit* of 8, coming from the fact that we have 3 bits to store the retry count. This is a default instead.
I guess it can be replaced with 6 as it's the default value according to Documentation/networking/ip-sysctl.rst.
I'd rather check the code because sometimes people forget to update that documentation but yes, it matches: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl...
We already have TCP_SYN_RETRIES for the sysctl file, maybe have another TCP_SYN_RETRIES_DEFAULT for the number?
It sounds reasonable to me.
+ syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1);
The default value is 4 as Documentation/networking/ip-sysctl.rst. I don't quite remember where the 1 is from. Should we update it as well and add the similar macro as above?
I would, yes. By the way, it's hardcoded here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/... but defined here for thin streams (connections where observed throughput is low: https://docs.kernel.org/networking/tcp-thin.html): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/incl... I'd suggest to define a constant with value '4' and use it.
+ c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); + + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8,
Similar to the comment above: these are not parameters of SYN segments (which would seem to imply TCP options, such as the MSS).
We typically don't print C assignments, rather human-readable messages, so that could be "Read sysctl values tcp_syn_retries: ..., syn_linear_timeouts: ...".
Got it.
+ c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 234a803..4369b52 100644 --- a/tcp.h +++ b/tcp.h @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @tcp_syn_retries: Number of SYN retries during handshake + * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout + * before switching to exponential backoff timeout
Maybe more compact:
* @syn_linear_timeouts: SYN retries before using exponential timeout
So we remove the '"Number of" as well? I guess we should make it consistent for both tcp_syn_retries and syn_linear_timeouts.
Ah, yes, we should. I mean, I guess it's clear that we're talking about the count / number of those and not their colour or taste...
*/ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t tcp_syn_retries; + uint8_t syn_linear_timeouts; };
-- Stefano