Hi David,
Found it (not your missing comment, but the bug), and it fixed the problem.
I'll post this patch separately shortly.
///jon
On 2024-06-04 13:36, Jon Maloy wrote:
Hi David,
This is the last comment I received from you regarding this patch.
See below for further comment.
On 2024-05-16 00:16, David Gibson wrote:
On Wed, May 15, 2024 at 10:57:06PM -0400, Jon
Maloy wrote:
On 2024-05-15 22:24, David Gibson wrote:
On Wed, May 15, 2024 at 11:34:27AM -0400, Jon
Maloy wrote:
> commit a469fc393fa1 ("tcp, tap: Don't increase tap-side sequence
> counter for dropped frames")
> delayed update of conn->seq_to_tap until the moment the corresponding
> frame has been successfully pushed out. This has the advantage
> that we
> immediately can make a new attempt to transmit a frame after a failed
> trasnmit, rather than waiting for the peer to later discover a gap
> and
> trigger the fast retransmit mechanism to solve the problem.
>
> This approach has turned out to cause a problem with spurious
> sequence
> number updates during peer-initiated retransmits, and we have
> realized
> it may not be the best way to solve the above issue.
>
> We now restore the previous method, by updating the said field at the
> moment a frame is added to the outqueue. To retain the advantage of
> having a quick re-attempt based on local failure detection, we now
> scan
> through the part of the outqueue that had do be dropped, and
> restore the
> sequence counter for each affected connection to the most appropriate
> value.
>
> Signed-off-by: Jon Maloy <jmaloy(a)redhat.com>
>
> ---
> v2: - Re-spun loop in tcp_revert_seq() and some other changes
> based on
> feedback from Stefano Brivio.
> - Added paranoid test to avoid that seq_to_tap becomes lower
> than
> seq_ack_from_tap.
>
> v3: - Identical to v2. Called v3 because it was embedded in a series
> with that version.
>
> v4: - In tcp_revert_seq(), we read the sequence number from the TCP
> header instead of keeping a copy in struct
> tcp_buf_seq_update.
> - Since the only remaining field in struct
> tcp_buf_seq_update is
> a pointer to struct tcp_tap_conn, we eliminate the struct
> altogether, and make the tcp6/tcp3_buf_seq_update arrays into
> arrays of said pointer.
> - Removed 'paranoid' test in tcp_revert_seq. If it happens, it
> is not fatal, and will be caught by other code anyway.
> - Separated from the series again.
> ---
> tcp.c | 59
> +++++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 37 insertions(+), 22 deletions(-)
>
> diff --git a/tcp.c b/tcp.c
> index 21d0af0..976dba8 100644
> --- a/tcp.c
> +++ b/tcp.c
> @@ -410,16 +410,6 @@ static int tcp_sock_ns [NUM_PORTS][IP_VERSIONS];
> */
> static union inany_addr low_rtt_dst[LOW_RTT_TABLE_SIZE];
> -/**
> - * tcp_buf_seq_update - Sequences to update with length of frames
> once sent
> - * @seq: Pointer to sequence number sent to tap-side, to be
> updated
> - * @len: TCP payload length
> - */
> -struct tcp_buf_seq_update {
> - uint32_t *seq;
> - uint16_t len;
> -};
> -
> /* Static buffers */
> /**
> * struct tcp_payload_t - TCP header and data to send segments
> with payload
> @@ -461,7 +451,8 @@ static struct tcp_payload_t
> tcp4_payload[TCP_FRAMES_MEM];
> static_assert(MSS4 <= sizeof(tcp4_payload[0].data), "MSS4 is
> greater than 65516");
> -static struct tcp_buf_seq_update tcp4_seq_update[TCP_FRAMES_MEM];
> +/* References tracking the owner connection of frames in the tap
> outqueue */
> +static struct tcp_tap_conn *tcp4_frame_conns[TCP_FRAMES_MEM];
> static unsigned int tcp4_payload_used;
> static struct tap_hdr tcp4_flags_tap_hdr[TCP_FRAMES_MEM];
> @@ -483,7 +474,8 @@ static struct tcp_payload_t
> tcp6_payload[TCP_FRAMES_MEM];
> static_assert(MSS6 <= sizeof(tcp6_payload[0].data), "MSS6 is
> greater than 65516");
> -static struct tcp_buf_seq_update tcp6_seq_update[TCP_FRAMES_MEM];
> +/* References tracking the owner connection of frames in the tap
> outqueue */
> +static struct tcp_tap_conn *tcp6_frame_conns[TCP_FRAMES_MEM];
> static unsigned int tcp6_payload_used;
> static struct tap_hdr tcp6_flags_tap_hdr[TCP_FRAMES_MEM];
> @@ -1261,25 +1253,49 @@ static void tcp_flags_flush(const struct
> ctx *c)
> tcp4_flags_used = 0;
> }
> +/**
> + * tcp_revert_seq() - Revert affected conn->seq_to_tap after
> failed transmission
> + * @conns: Array of connection pointers corresponding to
> queued frames
> + * @frames: Two-dimensional array containing queued frames
> with sub-iovs
You can make the 2d array explicit in the type as:
struct iovec (*frames)[TCP_NUM_IOVS];
See, for example the 'tap_iov' local in udp_tap_send(). (I recommend
the command line tool 'cdecl', also available online at
cdecl.org for
working out confusing pointer-to-array types).
Nice. I wasn't quite happy
with this.
> + * @num_frames: Number of entries in the
two arrays to be compared
> + */
> +static void tcp_revert_seq(struct tcp_tap_conn **conns, struct
> iovec *frames,
> + int num_frames)
> +{
> + int c, f;
> +
> + for (c = 0, f = 0; c < num_frames; c++, f += TCP_NUM_IOVS) {
Nit: I find having the two parallel counters kind of confusing. It
naturally goes away with the type change suggested above, but even
without that I'd prefer an explicit multiply in the body. I strongly
suspect the compiler will be better at working out if the strength
reduction is worth it.
> + struct tcp_tap_conn *conn = conns[c];
> + struct tcphdr *th = frames[f + TCP_IOV_PAYLOAD].iov_base;
> + uint32_t seq = ntohl(th->seq);
> +
> + if (SEQ_LE(conn->seq_to_tap, seq))
Isn't this test inverted? We want to rewind seq_to_tap if seq is less
than it, rather than the other way aruond.
No. We do 'continue', i.e.,
nothing, if this condition is fulfilled.
This may look a little non-intuitive here, but makes sense when I
add the
next patch.
Oh, of course, my mistake.
The code now (v7) looks as follows:
/**
* tcp_revert_seq() - Revert affected conn->seq_to_tap after failed
transmission
* @conns: Array of connection pointers corresponding to queued
frames
* @frames: Two-dimensional array containing queued frames with
sub-iovs
* @num_frames: Number of entries in the two arrays to be compared
*/
static void tcp_revert_seq(struct tcp_tap_conn **conns, struct iovec
(*frames)[TCP_NUM_IOVS],
int num_frames)
{
int i;
for (i = 0; i < num_frames; i++) {
struct tcp_tap_conn *conn = conns[i];
struct tcphdr *th = frames[i][TCP_IOV_PAYLOAD].iov_base;
uint32_t seq = ntohl(th->seq);
if (SEQ_LE(conn->seq_to_tap, seq))
continue;
conn->seq_to_tap = seq;
tcp_set_peek_offset(conn->sock, seq -
conn->seq_ack_from_tap);
}
}
/**
* tcp_payload_flush() - Send out buffers for segments with data
* @c: Execution context
*/
static void tcp_payload_flush(const struct ctx *c)
{
size_t m;
m = tap_send_frames(c, &tcp6_l2_iov[0][0], TCP_NUM_IOVS,
tcp6_payload_used);
if (m != tcp6_payload_used) {
tcp_revert_seq(tcp6_frame_conns, &tcp6_l2_iov[m],
tcp6_payload_used - m);
}
tcp6_payload_used = 0;
m = tap_send_frames(c, &tcp4_l2_iov[0][0], TCP_NUM_IOVS,
tcp4_payload_used);
if (m != tcp4_payload_used) {
tcp_revert_seq(tcp4_frame_conns, &tcp4_l2_iov[m],
tcp4_payload_used - m);
}
tcp4_payload_used = 0;
}
Was this the version you were talking about on Monday morning?
Did you spot some bug here which I am missing?
Thanks
///jon