On Wed, Nov 20, 2024 at 07:43:44AM +0100, Stefano Brivio wrote:
On Wed, 20 Nov 2024 12:02:00 +1100 David Gibson
wrote: On Tue, Nov 19, 2024 at 08:53:44PM +0100, Stefano Brivio wrote:
RFC 9293, 3.8.4 says:
Implementers MAY include "keep-alives" in their TCP implementations (MAY-5), although this practice is not universally accepted. Some TCP implementations, however, have included a keep-alive mechanism. To confirm that an idle connection is still active, these implementations send a probe segment designed to elicit a response from the TCP peer. Such a segment generally contains SEG.SEQ = SND.NXT-1 and may or may not contain one garbage octet of data. If keep-alives are included, the application MUST be able to turn them on or off for each TCP connection (MUST-24), and they MUST default to off (MUST-25).
but currently, tcp_data_from_tap() is not aware of this and will schedule a fast re-transmit on the second keep-alive (because it's also a duplicate ACK), ignoring the fact that the sequence number was rewinded to SND.NXT-1.
ACK these keep-alive segments, reset the activity timeout, and ignore them for the rest.
At some point, we could think of implementing an approximation of keep-alive segments on outbound sockets, for example by setting TCP_KEEPIDLE to 1, and a large TCP_KEEPINTVL, so that we send a single keep-alive segment at approximately the same time, and never reset the connection. That's beyond the scope of this fix, though.
Reported-by: Tim Besard
Link: https://github.com/containers/podman/discussions/24572 Signed-off-by: Stefano Brivio --- tcp.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/tcp.c b/tcp.c index f357920..1eb85bb 100644 --- a/tcp.c +++ b/tcp.c @@ -1763,6 +1763,20 @@ static int tcp_data_from_tap(const struct ctx *c, struct tcp_tap_conn *conn, continue;
seq = ntohl(th->seq); + if (SEQ_LT(seq, conn->seq_from_tap) && len <= 1) { + flow_trace(conn, + "keep-alive sequence: %u, previous: %u", + seq, conn->seq_from_tap); + + tcp_send_flag(c, conn, ACK); + tcp_timer_ctl(c, conn); + + if (p->count == 1) + return 1;
I'm not sure what this test is for. Shouldn't the continue be sufficient?
I don't think we want to go through tcp_update_seqack_from_tap(), tcp_tap_window_update() and the like on a keep-alive segment.
Ah, I see. But that is an optimisation, right? It shouldn't be necessary for correctness.
But if we receive something else in this batch, that's going to be a data segment that happened to arrive just after the keep-alive, so, in that case, we have to do the normal processing, by ignoring just this segment and hitting 'continue'.
Strictly speaking, the 'continue' is enough and correct, but I think that returning early in the obviously common case is simpler and more robust.
Hrm. Doesn't seem simpler to me, but I can see the point of the
change so,
Reviewed-by: David Gibson