On Sat, Apr 6, 2024 at 8:21 PM <jmaloy(a)redhat.com> wrote:
From: Jon Maloy <jmaloy(a)redhat.com>
Testing of the previous commit ("tcp: add support for SO_PEEK_OFF")
in this series along with the pasta protocol splicer revealed a bug in
the way tcp handles window advertising during extreme memory squeeze
situations.
The excerpt of the below logging session shows what is happeing:
[5201<->54494]: ==== Activating log @ tcp_select_window()/268 ====
[5201<->54494]: (inet_csk(sk)->icsk_ack.pending & ICSK_ACK_NOMEM) -->
TRUE
[5201<->54494]: tcp_select_window(<-) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354, returning 0
[5201<->54494]: ADVERTISING WINDOW SIZE 0
[5201<->54494]: __tcp_transmit_skb(<-) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
[5201<->54494]: tcp_recvmsg_locked(->)
[5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
[5201<->54494]: (win_now: 250164, new_win: 262144 >= (2 * win_now):
500328))? --> time_to_ack: 0
[5201<->54494]: NOT calling tcp_send_ack()
[5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
[5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now:
250164, qlen: 83
[...]
I would prefer a packetdrill test, it is not clear what is happening...
In particular, have you used SO_RCVBUF ?
>
> [5201<->54494]: tcp_recvmsg_locked(->)
> [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> [5201<->54494]: (win_now: 250164, new_win: 262144 >= (2 * win_now):
500328))? --> time_to_ack: 0
> [5201<->54494]: NOT calling tcp_send_ack()
> [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> [5201<->54494]: tcp_recvmsg_locked(<-) returning 131072 bytes, window now:
250164, qlen: 1
>
> [5201<->54494]: tcp_recvmsg_locked(->)
> [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> [5201<->54494]: (win_now: 250164, new_win: 262144 >= (2 * win_now):
500328))? --> time_to_ack: 0
> [5201<->54494]: NOT calling tcp_send_ack()
> [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> [5201<->54494]: tcp_recvmsg_locked(<-) returning 57036 bytes, window now:
250164, qlen: 0
>
> [5201<->54494]: tcp_recvmsg_locked(->)
> [5201<->54494]: __tcp_cleanup_rbuf(->) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> [5201<->54494]: NOT calling tcp_send_ack()
> [5201<->54494]: __tcp_cleanup_rbuf(<-) tp->rcv_wup: 2812454294,
tp->rcv_wnd: 5812224, tp->rcv_nxt 2818016354
> [5201<->54494]: tcp_recvmsg_locked(<-) returning -11 bytes, window now:
250164, qlen: 0
>
> We can see that although we are adverising a window size of zero,
> tp->rcv_wnd is not updated accordingly. This leads to a discrepancy
> between this side's and the peer's view of the current window size.
> - The peer thinks the window is zero, and stops sending.
> - This side ends up in a cycle where it repeatedly caclulates a new
> window size it finds too small to advertise.
>
> Hence no messages are received, and no acknowledges are sent, and
> the situation remains locked even after the last queued receive buffer
> has been consumed.
>
> We fix this by setting tp->rcv_wnd to 0 before we return from the
> function tcp_select_window() in this particular case.
> Further testing shows that the connection recovers neatly from the
> squeeze situation, and traffic can continue indefinitely.
>
> Reviewed-by: Stefano Brivio <sbrivio(a)redhat.com>
> Signed-off-by: Jon Maloy <jmaloy(a)redhat.com>
I do not think this patch is good. If we reach zero window, it is a
sign something is wrong.
TCP has heuristics to slow down the sender if the receiver does not
drain the receive queue fast enough.
MSG_PEEK is an obvious reason, and SO_RCVLOWAT too.
I suggest you take a look at tcp_set_rcvlowat(), see what is needed
for SO_PEEK_OFF (ab)use ?
In short, when SO_PEEK_OFF is in action :
- TCP needs to not delay ACK when receive queue starts to fill
- TCP needs to make sure sk_rcvbuf and tp->window_clamp grow (if
autotuning is enabled)