On Thu, Dec 04, 2025 at 08:45:34AM +0100, Stefano Brivio wrote:
For non-local connections, we advertise the same window size as what the peer in turn advertises to us, and limit it to the buffer size reported via SO_SNDBUF.
That's not quite correct: in order to later avoid failures while queueing data to the socket, we need to limit the window to the available buffer size, not the total one.
Use the SIOCOUTQ ioctl and subtract the number of outbound queued bytes from the total buffer size, then clamp to this value.
Signed-off-by: Stefano Brivio
Reviewed-by: David Gibson
--- README.md | 2 +- tcp.c | 18 ++++++++++++++++-- 2 files changed, 17 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md index 897ae8b..8fdc0a3 100644 --- a/README.md +++ b/README.md @@ -291,7 +291,7 @@ speeding up local connections, and usually requiring NAT. _pasta_: * ✅ all capabilities dropped, other than `CAP_NET_BIND_SERVICE` (if granted) * ✅ with default options, user, mount, IPC, UTS, PID namespaces are detached * ✅ no external dependencies (other than a standard C library) -* ✅ restrictive seccomp profiles (33 syscalls allowed for _passt_, 43 for +* ✅ restrictive seccomp profiles (34 syscalls allowed for _passt_, 43 for _pasta_ on x86_64) * ✅ examples of [AppArmor](/passt/tree/contrib/apparmor) and [SELinux](/passt/tree/contrib/selinux) profiles available diff --git a/tcp.c b/tcp.c index fa95f6b..863ccdb 100644 --- a/tcp.c +++ b/tcp.c @@ -1031,6 +1031,8 @@ void tcp_fill_headers(const struct ctx *c, struct tcp_tap_conn *conn, * @tinfo: tcp_info from kernel, can be NULL if not pre-fetched * * Return: 1 if sequence or window were updated, 0 otherwise + * + * #syscalls ioctl */ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, bool force_seq, struct tcp_info_linux *tinfo) @@ -1113,9 +1115,21 @@ int tcp_update_seqack_wnd(const struct ctx *c, struct tcp_tap_conn *conn, if ((conn->flags & LOCAL) || tcp_rtt_dst_low(conn)) { new_wnd_to_tap = tinfo->tcpi_snd_wnd; } else { + uint32_t sendq; + int limit; + + if (ioctl(s, SIOCOUTQ, &sendq)) { + debug_perror("SIOCOUTQ on socket %i, assuming 0", s); + sendq = 0; + } tcp_get_sndbuf(conn); - new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, - SNDBUF_GET(conn)); + + if ((int)sendq > SNDBUF_GET(conn)) /* Due to memory pressure? */ + limit = 0; + else + limit = SNDBUF_GET(conn) - (int)sendq; + + new_wnd_to_tap = MIN((int)tinfo->tcpi_snd_wnd, limit); }
new_wnd_to_tap = MIN(new_wnd_to_tap, MAX_WINDOW); -- 2.43.0
-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson