[PATCH v5 3/4] tcp: Resend SYN for inbound connections
If a client connects while guest is not connected or ready yet,
resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153
Signed-off-by: Yumei Huang
On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
--- tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- tcp.h | 5 +++++ 2 files changed, 52 insertions(+), 8 deletions(-) diff --git a/tcp.c b/tcp.c index 2ec4b0c..71e2da0 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend + * SYN. It's the starting timeout for the first SYN retry. If this persists + * for more than TCP_MAX_RETRIES or (tcp_syn_retries + + * tcp_syn_linear_timeouts) times in a row, reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s */ #define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \ + /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; + if (!(conn->events & ESTABLISHED)) { + if (conn->retries < c->tcp.syn_linear_timeouts) + it.it_value.tv_sec = SYN_TIMEOUT_INIT; + else + it.it_value.tv_sec = SYN_TIMEOUT_INIT << + (conn->retries - c->tcp.syn_linear_timeouts); + } else it.it_value.tv_sec = ACK_TIMEOUT; } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++; + tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_syn_params_init() - Get initial syn params for inbound connection
Stefano's comment wasn't 100% clear, but I believe he was suggesting s/syn params/SYN parameters/ in the comment here for clarity.
+ * @c: Execution context +*/ +void tcp_syn_params_init(struct ctx *c) +{ + intmax_t tcp_syn_retries, syn_linear_timeouts; + + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8); + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1); + + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); + + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 234a803..bb58324 100644 --- a/tcp.h +++ b/tcp.h @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @tcp_syn_retries: Number of times initial SYNs during handshake
Perhaps, "Number of SYN retries using exponential backoff"
+ * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout + * before switching to exponential backoff timeout */ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t tcp_syn_retries; + uint8_t syn_linear_timeouts; };
#endif /* TCP_H */ -- 2.47.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Oct 17, 2025 at 12:44 PM David Gibson
On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
--- tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- tcp.h | 5 +++++ 2 files changed, 52 insertions(+), 8 deletions(-) diff --git a/tcp.c b/tcp.c index 2ec4b0c..71e2da0 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend + * SYN. It's the starting timeout for the first SYN retry. If this persists + * for more than TCP_MAX_RETRIES or (tcp_syn_retries + + * tcp_syn_linear_timeouts) times in a row, reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s */ #define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \ + /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; + if (!(conn->events & ESTABLISHED)) { + if (conn->retries < c->tcp.syn_linear_timeouts) + it.it_value.tv_sec = SYN_TIMEOUT_INIT; + else + it.it_value.tv_sec = SYN_TIMEOUT_INIT << + (conn->retries - c->tcp.syn_linear_timeouts); + } else it.it_value.tv_sec = ACK_TIMEOUT; } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++; + tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_syn_params_init() - Get initial syn params for inbound connection
Stefano's comment wasn't 100% clear, but I believe he was suggesting s/syn params/SYN parameters/ in the comment here for clarity.
Oops, I missed this one.
+ * @c: Execution context +*/ +void tcp_syn_params_init(struct ctx *c) +{ + intmax_t tcp_syn_retries, syn_linear_timeouts; + + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8); + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1); + + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); + + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 234a803..bb58324 100644 --- a/tcp.h +++ b/tcp.h @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @tcp_syn_retries: Number of times initial SYNs during handshake
Perhaps,
"Number of SYN retries using exponential backoff"
It's not actually retries using exponential backoff. It's the total number of retries, including linear backoff and exponential backoff. I got the "Number of times initial SYNs" from the kernel doc https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst. Maybe "Number of SYN retries during handshake" ? What do you think?
+ * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout + * before switching to exponential backoff timeout */ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t tcp_syn_retries; + uint8_t syn_linear_timeouts; };
#endif /* TCP_H */ -- 2.47.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
-- Thanks, Yumei Huang
On Fri, Oct 17, 2025 at 01:46:15PM +0800, Yumei Huang wrote:
On Fri, Oct 17, 2025 at 12:44 PM David Gibson
wrote: On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
--- tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- tcp.h | 5 +++++ 2 files changed, 52 insertions(+), 8 deletions(-) diff --git a/tcp.c b/tcp.c index 2ec4b0c..71e2da0 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend + * SYN. It's the starting timeout for the first SYN retry. If this persists + * for more than TCP_MAX_RETRIES or (tcp_syn_retries + + * tcp_syn_linear_timeouts) times in a row, reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s */ #define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \ + /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; + if (!(conn->events & ESTABLISHED)) { + if (conn->retries < c->tcp.syn_linear_timeouts) + it.it_value.tv_sec = SYN_TIMEOUT_INIT; + else + it.it_value.tv_sec = SYN_TIMEOUT_INIT << + (conn->retries - c->tcp.syn_linear_timeouts); + } else it.it_value.tv_sec = ACK_TIMEOUT; } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++; + tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_syn_params_init() - Get initial syn params for inbound connection
Stefano's comment wasn't 100% clear, but I believe he was suggesting s/syn params/SYN parameters/ in the comment here for clarity.
Oops, I missed this one.
+ * @c: Execution context +*/ +void tcp_syn_params_init(struct ctx *c) +{ + intmax_t tcp_syn_retries, syn_linear_timeouts; + + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8); + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1); + + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); + + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 234a803..bb58324 100644 --- a/tcp.h +++ b/tcp.h @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @tcp_syn_retries: Number of times initial SYNs during handshake
Perhaps,
"Number of SYN retries using exponential backoff"
It's not actually retries using exponential backoff. It's the total number of retries, including linear backoff and exponential backoff.
That's what ip-sysctl.rst says, but as we discussed the other day, the kernel code and behaviour seem to treat it as just the retries after exhausting linear_timeouts.
I got the "Number of times initial SYNs" from the kernel doc https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst.
Right, but by cutting off the "will be retransmitted", it's no longer grammatical (admittedly it's a pretty long confusing sentence).
Maybe "Number of SYN retries during handshake" ? What do you think?
That would be good if the description in ip-sysctl.rst was accurate, but it seems not to be.
+ * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout + * before switching to exponential backoff timeout */ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t tcp_syn_retries; + uint8_t syn_linear_timeouts; };
#endif /* TCP_H */ -- 2.47.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
-- Thanks,
Yumei Huang
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
On Fri, Oct 17, 2025 at 3:48 PM David Gibson
On Fri, Oct 17, 2025 at 01:46:15PM +0800, Yumei Huang wrote:
On Fri, Oct 17, 2025 at 12:44 PM David Gibson
wrote: On Fri, Oct 17, 2025 at 12:27:42PM +0800, Yumei Huang wrote:
If a client connects while guest is not connected or ready yet, resend SYN instead of just resetting connection after 10 seconds.
Use the same backoff calculation for the timeout as linux kernel.
Link: https://bugs.passt.top/show_bug.cgi?id=153 Signed-off-by: Yumei Huang
--- tcp.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++-------- tcp.h | 5 +++++ 2 files changed, 52 insertions(+), 8 deletions(-) diff --git a/tcp.c b/tcp.c index 2ec4b0c..71e2da0 100644 --- a/tcp.c +++ b/tcp.c @@ -179,9 +179,11 @@ * * Timeouts are implemented by means of timerfd timers, set based on flags: * - * - SYN_TIMEOUT: if no ACK is received from tap/guest during handshake (flag - * ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, reset the - * connection + * - SYN_TIMEOUT_INIT: if no ACK is received from tap/guest during handshake + * (flag ACK_FROM_TAP_DUE without ESTABLISHED event) within this time, resend + * SYN. It's the starting timeout for the first SYN retry. If this persists + * for more than TCP_MAX_RETRIES or (tcp_syn_retries + + * tcp_syn_linear_timeouts) times in a row, reset the connection * * - ACK_TIMEOUT: if no ACK segment was received from tap/guest, after sending * data (flag ACK_FROM_TAP_DUE with ESTABLISHED event), re-send data from the @@ -340,7 +342,7 @@ enum { #define WINDOW_DEFAULT 14600 /* RFC 6928 */
#define ACK_INTERVAL 10 /* ms */ -#define SYN_TIMEOUT 10 /* s */ +#define SYN_TIMEOUT_INIT 1 /* s */ #define ACK_TIMEOUT 2 #define FIN_TIMEOUT 60 #define ACT_TIMEOUT 7200 @@ -365,6 +367,9 @@ uint8_t tcp_migrate_rcv_queue [TCP_MIGRATE_RCV_QUEUE_MAX];
#define TCP_MIGRATE_RESTORE_CHUNK_MIN 1024 /* Try smaller when above this */
+#define TCP_SYN_RETRIES "/proc/sys/net/ipv4/tcp_syn_retries" +#define TCP_SYN_LINEAR_TIMEOUTS "/proc/sys/net/ipv4/tcp_syn_linear_timeouts" \ + /* "Extended" data (not stored in the flow table) for TCP flow migration */ static struct tcp_tap_transfer_ext migrate_ext[FLOW_MAX];
@@ -581,8 +586,13 @@ static void tcp_timer_ctl(const struct ctx *c, struct tcp_tap_conn *conn) if (conn->flags & ACK_TO_TAP_DUE) { it.it_value.tv_nsec = (long)ACK_INTERVAL * 1000 * 1000; } else if (conn->flags & ACK_FROM_TAP_DUE) { - if (!(conn->events & ESTABLISHED)) - it.it_value.tv_sec = SYN_TIMEOUT; + if (!(conn->events & ESTABLISHED)) { + if (conn->retries < c->tcp.syn_linear_timeouts) + it.it_value.tv_sec = SYN_TIMEOUT_INIT; + else + it.it_value.tv_sec = SYN_TIMEOUT_INIT << + (conn->retries - c->tcp.syn_linear_timeouts); + } else it.it_value.tv_sec = ACK_TIMEOUT; } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { @@ -2409,8 +2419,17 @@ void tcp_timer_handler(const struct ctx *c, union epoll_ref ref) tcp_timer_ctl(c, conn); } else if (conn->flags & ACK_FROM_TAP_DUE) { if (!(conn->events & ESTABLISHED)) { - flow_dbg(conn, "handshake timeout"); - tcp_rst(c, conn); + if (conn->retries >= TCP_MAX_RETRIES || + conn->retries >= (c->tcp.tcp_syn_retries + + c->tcp.syn_linear_timeouts)) { + flow_dbg(conn, "handshake timeout"); + tcp_rst(c, conn); + } else { + flow_trace(conn, "SYN timeout, retry"); + tcp_send_flag(c, conn, SYN); + conn->retries++; + tcp_timer_ctl(c, conn); + } } else if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) { flow_dbg(conn, "FIN timeout"); tcp_rst(c, conn); @@ -2766,6 +2785,24 @@ static socklen_t tcp_probe_tcp_info(void) return sl; }
+/** + * tcp_syn_params_init() - Get initial syn params for inbound connection
Stefano's comment wasn't 100% clear, but I believe he was suggesting s/syn params/SYN parameters/ in the comment here for clarity.
Oops, I missed this one.
+ * @c: Execution context +*/ +void tcp_syn_params_init(struct ctx *c) +{ + intmax_t tcp_syn_retries, syn_linear_timeouts; + + tcp_syn_retries = read_file_integer(TCP_SYN_RETRIES, 8); + syn_linear_timeouts = read_file_integer(TCP_SYN_LINEAR_TIMEOUTS, 1); + + c->tcp.tcp_syn_retries = MIN(tcp_syn_retries, UINT8_MAX); + c->tcp.syn_linear_timeouts = MIN(syn_linear_timeouts, UINT8_MAX); + + debug("TCP SYN parameters: retries=%"PRIu8", linear_timeouts=%"PRIu8, + c->tcp.tcp_syn_retries, c->tcp.syn_linear_timeouts); +} + /** * tcp_init() - Get initial sequence, hash secret, initialise per-socket data * @c: Execution context @@ -2776,6 +2813,8 @@ int tcp_init(struct ctx *c) { ASSERT(!c->no_tcp);
+ tcp_syn_params_init(c); + tcp_sock_iov_init(c);
memset(init_sock_pool4, 0xff, sizeof(init_sock_pool4)); diff --git a/tcp.h b/tcp.h index 234a803..bb58324 100644 --- a/tcp.h +++ b/tcp.h @@ -59,12 +59,17 @@ union tcp_listen_epoll_ref { * @fwd_out: Port forwarding configuration for outbound packets * @timer_run: Timestamp of most recent timer run * @pipe_size: Size of pipes for spliced connections + * @tcp_syn_retries: Number of times initial SYNs during handshake
Perhaps,
"Number of SYN retries using exponential backoff"
It's not actually retries using exponential backoff. It's the total number of retries, including linear backoff and exponential backoff.
That's what ip-sysctl.rst says, but as we discussed the other day, the kernel code and behaviour seem to treat it as just the retries after exhausting linear_timeouts.
You are right. I forgot about that.
I got the "Number of times initial SYNs" from the kernel doc https://www.kernel.org/doc/Documentation/networking/ip-sysctl.rst.
Right, but by cutting off the "will be retransmitted", it's no longer grammatical (admittedly it's a pretty long confusing sentence).
Maybe "Number of SYN retries during handshake" ? What do you think?
That would be good if the description in ip-sysctl.rst was accurate, but it seems not to be.
I probably should wait a bit before sending the v6. Anyway, I will wait and send the v7 next week to see if more comments from Stefano. Thanks for all the reviews and comments.
+ * @syn_linear_timeouts: Number of SYN retries using linear backoff timeout + * before switching to exponential backoff timeout */ struct tcp_ctx { struct fwd_ports fwd_in; struct fwd_ports fwd_out; struct timespec timer_run; size_t pipe_size; + uint8_t tcp_syn_retries; + uint8_t syn_linear_timeouts; };
#endif /* TCP_H */ -- 2.47.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
-- Thanks,
Yumei Huang
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson
-- Thanks, Yumei Huang
participants (2)
-
David Gibson
-
Yumei Huang