On Tue, 22 Oct 2024 15:19:53 +0200 Stefano Brivio <sbrivio(a)redhat.com> wrote:On Tue, 22 Oct 2024 14:59:19 +0200 Laurent Vivier <lvivier(a)redhat.com> wrote:...I was able to reproduce that only once, but I didn't have any debugging option enabled. I kept retrying with your follow-up patch but I'm back to the same behaviour as I had without it (client completing transfer, server hanging). That single time I reproduced it, both client and server exited, but I couldn't restart the server right away on the same port. Anyway, looking further into this: I'm using a recent net.git kernel, with this series on top of current passt's HEAD and this follow-up patch. With --trace, I see that the transfer completes, then, once the client is done, I get: --- 37.6288: Flow 0 (TCP connection): flag at tcp_vu_data_from_sock:417 37.6288: vhost-user: virtqueue can skip notify... 37.6288: Flow 0 (TCP connection): flag at tcp_vu_data_from_sock:477 37.6289: Flow 0 (TCP connection): timer expires in 2.000s 37.6289: passt: epoll event on connected TCP socket 75 (events: 0x00002001) --- EPOLLRDHUP | EPOLLIN on the socket --- 37.6289: Flow 0 (TCP connection): event at tcp_sock_handler:2339 37.6289: Flow 0 (TCP connection): SOCK_FIN_RCVD: ESTABLISHED -> CLOSE_WAIT --- FIN segment (inferred) from the socket --- 37.6289: Flow 0 (TCP connection): event at tcp_vu_sock_recv:262 37.6289: Flow 0 (TCP connection): TAP_FIN_SENT: CLOSE_WAIT -> LAST_ACK --- FIN to the guest --- 37.6645: passt: epoll event on vhost-user kick socket 73 (events: 0x00000001) 37.6646: vhost-user: ot kick_data: 0000000000000001 idx:1 37.6646: tap: protocol 6, 88.198.0.164:6001 -> 88.198.0.161:53136 (1 packet) 37.6646: Flow 0 (TCP connection): packet length 20 from tap 37.6646: Flow 0 (TCP connection): flag at tcp_update_seqack_from_tap:1215 37.6646: Flow 0 (TCP connection): timer expires in 2.000s 37.6647: Flow 0 (TCP connection): flag at tcp_tap_handler:2095 37.6670: passt: epoll event on vhost-user kick socket 73 (events: 0x00000001) 37.6671: vhost-user: ot kick_data: 0000000000000001 idx:1 37.6671: tap: protocol 6, 88.198.0.164:6001 -> 88.198.0.161:53136 (2 packets) 37.6671: Flow 0 (TCP connection): packet length 20 from tap 37.6671: Flow 0 (TCP connection): flag at tcp_update_seqack_from_tap:1215 37.6671: Flow 0 (TCP connection): timer expires in 2.000s 37.6671: Flow 0 (TCP connection): flag at tcp_tap_handler:2095 37.6671: passt: epoll event on vhost-user kick socket 73 (events: 0x00000001) 37.6671: vhost-user: ot kick_data: 0000000000000002 idx:1 37.6672: tap: protocol 6, 88.198.0.164:6001 -> 88.198.0.161:53136 (1 packet) 37.6672: Flow 0 (TCP connection): packet length 20 from tap 37.6672: Flow 0 (TCP connection): flag at tcp_update_seqack_from_tap:1210 37.6672: Flow 0 (TCP connection): ACK_FROM_TAP_DUE dropped 37.6672: Flow 0 (TCP connection): event at tcp_data_from_tap:1922 37.6672: Flow 0 (TCP connection): TAP_FIN_ACKED: LAST_ACK -> LAST_ACK --- everything ACKed from the guest, including the FIN --- 37.6672: Flow 0 (TCP connection): flag at tcp_tap_handler:2095 --- STALLED flag unset ...and now we're waiting for the guest to send us a FIN, so that we can "send" it to the socket (shutdown(x, SHUT_WR)) and close the connection in tap_sock_handler(): if (conn->events & ESTABLISHED) { if (CONN_HAS(conn, SOCK_FIN_SENT | TAP_FIN_ACKED)) conn_event(c, conn, CLOSED); ...but that never comes. I also took a capture (attached), in a separate attempt matching this one (timestamps might look different), which also shows that we're getting all the packets acknowledged by the guest, but we're never sending a FIN to it. This seems to be confirmed by stracing socat in the guest. The transfer ends like this: read(6, "M6\263%\245\257\205\24\341\316\377\270\306\301\244\17\333\241/E\211/\243g\367\23\216-\346\306\22\356"..., 8192) = 3056 recvfrom(3, 0x7ffeb3145360, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x7ffeb3145860, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) write(5, "M6\263%\245\257\205\24\341\316\377\270\306\301\244\17\333\241/E\211/\243g\367\23\216-\346\306\22\356"..., 3056) = 3056 recvfrom(3, 0x7ffeb3145860, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x7ffeb31458d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) pselect6(7, [6], [5], [], NULL, NULL) = 1 (out [5]) recvfrom(3, 0x7ffeb31458d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) recvfrom(3, 0x7ffeb31458d0, 519, MSG_DONTWAIT, NULL, NULL) = -1 EAGAIN (Resource temporarily unavailable) pselect6(7, [6], [], [], NULL, NULL ...we should see an event on the socket and a zero-sized recvfrom() if we sent a FIN. If we see TAP_FIN_SENT in the logs, it means that tcp_vu_send_flag() is called, but it doesn't look like it's sending the FIN | ACK flags as we request in tcp_vu_sock_recv(): int retf = tcp_vu_send_flag(c, conn, FIN | ACK); So I added prints in tcp_vu_send_flag(): --- diff --git a/tcp_vu.c b/tcp_vu.c index fc077c7..234d909 100644 --- a/tcp_vu.c +++ b/tcp_vu.c @@ -99,6 +99,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) int nb_ack; int ret; + err("%s:%i", __func__, __LINE__); + hdrlen = tcp_vu_l2_hdrlen(CONN_V6(conn)); vu_init_elem(elem, iov_vu, 2); @@ -109,6 +111,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) if (elem_cnt < 1) return 0; + err("%s:%i", __func__, __LINE__); + vu_set_vnethdr(vdev, &iov_vu[0], 1, 0); eh = vu_eth(iov_vu[0].iov_base); @@ -139,6 +143,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) return ret; } + err("%s:%i", __func__, __LINE__); + l4len = tcp_fill_headers4(conn, NULL, iph, payload, optlen, NULL, seq, true); l2len = sizeof(*iph); @@ -165,6 +171,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) return ret; } + err("%s:%i", __func__, __LINE__); + l4len = tcp_fill_headers6(conn, NULL, ip6h, payload, optlen, seq, true); l2len = sizeof(*ip6h); @@ -196,6 +204,8 @@ int tcp_vu_send_flag(const struct ctx *c, struct tcp_tap_conn *conn, int flags) vu_flush(vdev, vq, elem, nb_ack); + err("%s:%i", __func__, __LINE__); + return 0; } --- and it looks like vu_collect_one_frame() returns 0, so tcp_vu_send_flag() returns 0, we think we sent the FIN segment to the guest, but we didn't: --- 21.9120: Flow 0 (TCP connection): event at tcp_sock_handler:2339 21.9120: tcp_vu_send_flag:102 21.9120: Flow 0 (TCP connection): event at tcp_vu_sock_recv:272 21.9120: Flow 0 (TCP connection): TAP_FIN_SENT: CLOSE_WAIT -> LAST_ACK --- ...if vu_collect_one_frame() returns >= 1, as it usually does, tcp_vu_send_flag() would reach at least line 114. Side note: I thought I commented on this in a previous revision but I can't find my comment anymore: if vu_collect_one_frame() returns 0, I guess that should be an error. I don't know why vu_collect_one_frame() returns 0 here... I hope you do. :) -- StefanoOn 17/10/2024 02:10, Stefano Brivio wrote:I guess the connection from the first test is not closed properly, so it's still in TIME_WAIT or similar. Given the topic, I can imagine that something goes wrong as we check: if (ack && conn->events & TAP_FIN_SENT && conn->seq_ack_from_tap == conn->seq_to_tap) conn_event(c, conn, TAP_FIN_ACKED); in tcp_data_from_tap(). Or something around FIN segments anyway. You can give socat the 'reuseaddr' option, say: socat -u TCP4-LISTEN:10001,reuseaddr OPEN:test_big.bin,create,trunc to see if that's actually the case.On Wed, 16 Oct 2024 11:41:34 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:I have a fix for that but there is an error I don't understand: when I run twice the test, the second time I have: guest: # socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc # socat -u TCP4-LISTEN:10001 OPEN:test_big.bin,create,trunc 2024/10/22 08:51:58 socat[1485] E bind(5, {AF=2 0.0.0.0:10001}, 16): Address already in use host: $ socat -u OPEN:test/big.bin TCP4:127.0.0.1:10001 If I wait a little it can work again several times and fails again. Any idea?On Tue, Oct 15, 2024 at 09:54:38PM +0200, Stefano Brivio wrote: > [Still partial review] [snip] >> + if (peek_offset_cap) >> + already_sent = 0; >> + >> + iov_vu[0].iov_base = tcp_buf_discard; >> + iov_vu[0].iov_len = already_sent; > > I think I had a similar comment to a previous revision. Now, I haven't > tested this (yet) on a kernel with support for SO_PEEK_OFF on TCP, but > I think this should eventually follow the same logic as the (updated) > tcp_buf_data_from_sock(): we should use tcp_buf_discard only if > (!peek_offset_cap). > > It's fine to always initialise VIRTQUEUE_MAX_SIZE iov_vu items, > starting from 1, for simplicity. But I'm not sure if it's safe to pass a > zero iov_len if (peek_offset_cap). > I'll test that (unless you already did) -- if it works, we can fix this > up later as well. I believe I tested it at some point, and I think we're already using it somewhere.I tested it again just to be sure on a recent net.git kernel: sometimes the first test in passt_vu_in_ns/tcp, "TCP/IPv4: host to guest: big transfer" hangs on my setup, sometimes it's the "TCP/IPv4: ns to guest (using loopback address): big transfer" test instead. I can reproduce at least one of the two issues consistently (tests stopped 5 times out of 5). The socat client completes the transfer, the server is still waiting for something. I haven't taken captures yet or tried to re-send from the client. It all works (consistently) with an older kernel without support for SO_PEEK_OFF on TCP, but also on this kernel if I force peek_offset_cap to false in tcp_init().The patch is: [...]I can try that in a bit.