On Mon, 10 Nov 2025 21:46:55 +1100
David Gibson
On Mon, Nov 10, 2025 at 05:31:35PM +0800, Yumei Huang wrote:
If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as Linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
Reviewed-by: David Gibson
Though I have one small concern remaining
--- tcp.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++--------- tcp.h | 4 ++++ 2 files changed, 57 insertions(+), 9 deletions(-)
diff --git a/tcp.c b/tcp.c index 2f49327..da40a99 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no SYN,ACK is received from tap/guest during + * handshake (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within + * this time, resend SYN. It's the starting timeout for the first SYN + * retry. Retry for TCP_MAX_RETRIES or (syn_retries + syn_linear_timeouts) + * times, reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s, RFC 6928 */ #define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,13 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" + +#define SYN_RETRIES_DEFAULT 6 +#define SYN_LINEAR_TIMEOUTS_DEFAULT 4 +#define MAX_SYNCNT 127 /* derived from kernel's limit */ + /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -585,10 +594,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; - else + if (!(conn->events & ESTABLISHED)) { + int exp = conn->retries - c->tcp.syn_linear_timeouts;
As discussed in the thread on the last version, the subtraction will be done unsigned, so the final result depends on behaviour of casting a "negative" (that is, close to max value) unsigned value into a signed variable. I'm pretty sure that will do the right thing in practice, but I don't believe that behaviour is guaranteed by the C standard.
Surprisingly to me, C99 6.2.6.2 "Integer types" admits three interpretations of the sign bit for signed types, quoting: — the corresponding value with sign bit 0 is negated (sign and magnitude); — the sign bit has the value −(2N ) (two’s complement); — the sign bit has the value −(2N − 1) (ones’ complement). and this, together with the conversion rule from 6.3.1.3 "Signed and unsigned integers": Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. seems to confirm your interpretation, even though I'm really having a hard time convincing myself that we should actually do stuff like: int exp; exp = (int)conn->retries - c->tcp.syn_linear_timeouts; and I still have the suspicion we're missing something. Is this syntax what you're suggesting, by the way?
This probably isn't worth a respin, though.
Maybe not but:
+ it.it_value.tv_sec = SYN_TIMEOUT_INIT << MAX(exp, 0); + } + else {
this is another bit I would need to change. The coding style dictates: if (x) { ... } else { ... } *not*: if (x) { ... } else { ... } ...on the other hand, it goes away in 5/6, so I don't care particularly. It still matters a very tiny bit because if we need to revert the commit resulting from 5/6 for any reason we'll end up with broken coding style. But that's not a realistic scenario and it's very minor anyway.
it.it_value.tv_sec = ACK_TIMEOUT; + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { it.it_value.tv_sec = FIN_TIMEOUT; } else { @@ -2425,8 +2437,18 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + int max; + max = c->tcp.syn_retries + c->tcp.syn_linear_timeouts; + max = MIN(TCP_MAX_RETRIES, max); + if (conn->retries >= max) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++; + tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2782,6 +2804,26 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_get_rto_params() - Get host kernel RTO parameters + * @c: Execution context + */ +void tcp_get_rto_params(struct ctx *c) +{ + intmax_t v; + + v = read_file_integer(SYN_RETRIES, SYN_RETRIES_DEFAULT); + c->tcp.syn_retries = MIN(v, MAX_SYNCNT); + + v = read_file_integer(SYN_LINEAR_TIMEOUTS, SYN_LINEAR_TIMEOUTS_DEFAULT); + c->tcp.syn_linear_timeouts = MIN(v, MAX_SYNCNT); + + debug("Read sysctl values syn_retries: %"PRIu8 + ", syn_linear_timeouts: %"PRIu8, + c->tcp.syn_retries, + c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2792,6 +2834,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_get_rto_params(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 0082386..37d7758 100644 --- a/tcp.h +++ b/tcp.h @@ -60,12 +60,16 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @syn_retries: SYN retries using exponential backoff timeout + * @syn_linear_timeouts: SYN retries before using exponential backoff timeout */ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t syn_retries; + uint8_t syn_linear_timeouts; };
#endif /* TCP_H */ -- 2.51.0
-- Stefano