On Fri, May 10, 2024 at 03:40:41PM -0400, Jon Maloy wrote:On 2024-05-10 12:40, Stefano Brivio wrote: > On Wed, 8 May 2024 23:00:23 -0400 > Jon Maloy <jmaloy(a)redhat.com> wrote:[snip]We could, but I think that's ok. If we hit this on retransmit frames, it means they actually haven't been retramsitted yet because of the failure, so we need to try again, just like any other frame we failed to retransmit. For example in a single epoll cycle: 1. We queue frames 2, 3 & 4 [queue is (2, 3, 4)] 2. We get a dup ack for frame 1, and start retransmit 3. We queue frames 1 & 2 for retransmit [queue is (2, 3, 4, 1, 2)], seq_to_tap is 3 4. We flush the queued frames, but there's a failure after the first two. (2, 3) where transmitted, (4, 1, 2) failed. 5. We step through the failed frames 5.1. We see frame 4 failed, but seq_to_tap == 3 <= 4, so we ignore it 5.2. We see frame 1 failed, and seq_to_tap == 3 > 1 so we rewind to 1. This is correct, because our retransmit failed and we need to do it again. 5.3. We see frame 2 failed, but seq_to_tap == 1 <= 2 so ignore The steps are slightly different, but I'm pretty sure this also does the right thing if - In the retransmit we get further than we got with the initial transmits - We start a retransmit several times (in a single queue batch (not sure if that's possible). This could involve multiple rewinds during a single revert scan, but while that's arguably non-optimal it should be both rare and not really that expensive. - The retransmit starts from a point after the earliest initial frame in the queue (how would the peer request a retransmit for something we never transmitted? but maybe possible if the first transmit in this queue batch is itself a retransmit from an earlier cycle. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibsonNot so sure about that. We can be in the middle of retransmit.+ */ +static void tcp_revert_seq(struct tcp_buf_seq_update *seq_update, int s, int e) +{ + struct tcp_tap_conn *conn; + uint32_t lowest_seq; + int i, ii; + + for (i = s; i < e; i++) { + conn = seq_update[i].conn; + lowest_seq = seq_update[i].seq; + + for (ii = i + 1; ii < e; ii++) { + if (seq_update[ii].conn != conn) + continue; + if (SEQ_GT(lowest_seq, seq_update[ii].seq)) + lowest_seq = seq_update[ii].seq; + }If I recall correctly, David suggested a simpler approach that avoids this O(n^2) scan, based on the observation that 1. the first entry you find in the table also has the lowest sequence number (we don't send frames out-of-order),