On Mon, Nov 3, 2025 at 9:10 AM David Gibson
On Fri, Oct 31, 2025 at 01:42:40PM +0800, Yumei Huang wrote:
If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as Linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
Reviewed-by: David Gibson
There are a couple of tiny style nits and a mostly theoretical bug noted below.
--- tcp.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++-------- tcp.h | 4 ++++ 2 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/tcp.c b/tcp.c index 2ec4b0c..bada88a 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake
Nit: I'd maybe say SYN,ACK here to distinguish it from other ACKs.
+ * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend + * SYN. It's the starting timeout for the first SYN retry. Retry for + * TCP_MAX_RETRIES or (tcp_syn_retries + tcp_syn_linear_timeouts) times, + * reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s, RFC 6928 */ #define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,12 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" + +#define TCP_SYN_RETRIES_DEFAULT 6 +#define TCP_SYN_LINEAR_TIMEOUTS_DEFAULT 4 + /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +589,10 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; + if (!(conn->events & ESTABLISHED)) { + int exp = conn->retries - c->tcp.syn_linear_timeouts; + it.it_value.tv_sec = SYN_TIMEOUT_INIT << MAX(exp, 0); + } else
A non-obvious detail of the style we use is that if one branch of an if uses { }, then the other branch should as well, even if it's one line. So above should be "} else {" and the bit below changed to match.
Oh, I didn't know about this style. Will update it later.
it.it_value.tv_sec = ACK_TIMEOUT; } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) {
Here's the theoretical bug. We only clamp tcp_syn_retries and syn_linear_timeouts to a uint8_t, so if the host has these set to strangely large values, this addition could overflow.
Just checked net/ipv4/sysctl_net_ipv4.c, static int tcp_syn_retries_min = 1; static int tcp_syn_retries_max = MAX_TCP_SYNCNT; static int tcp_syn_linear_timeouts_max = MAX_TCP_SYNCNT; And MAX_TCP_SYNCNT is defined as 127: #define MAX_TCP_SYNCNT 127 they have a max value as 127, so there won't be overflows.
+ flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++; + tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2766,6 +2785,26 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_get_rto_params() - Get host kernel RTO parameters + * @c: Execution context +*/ +void tcp_get_rto_params(struct ctx *c) +{ + intmax_t tcp_syn_retries, syn_linear_timeouts; + + tcp_syn_retries = read_file_integer( + TCP_SYN_RETRIES, TCP_SYN_RETRIES_DEFAULT); + syn_linear_timeouts = read_file_integer( + TCP_SYN_LINEAR_TIMEOUTS, TCP_SYN_LINEAR_TIMEOUTS_DEFAULT); + + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); + + debug("Read sysctl values tcp_syn_retries: %"PRIu8", linear_timeouts: %"PRIu8, + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2776,6 +2815,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_get_rto_params(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 234a803..befedde 100644 --- a/tcp.h +++ b/tcp.h @@ -59,12 +59,16 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @tcp_syn_retries: SYN retries using exponential backoff timeout + * @syn_linear_timeouts: SYN retries before using exponential backoff timeout */ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t tcp_syn_retries; + uint8_t syn_linear_timeouts; };
#endif /* TCP_H */ -- 2.49.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
-- Thanks, Yumei Huang