On Fri, 5 Dec 2025 13:50:15 +1100
David Gibson
On Fri, Dec 05, 2025 at 02:20:12AM +0100, Stefano Brivio wrote:
On Fri, 5 Dec 2025 11:08:06 +1100 David Gibson
wrote: On Thu, Dec 04, 2025 at 08:45:37AM +0100, Stefano Brivio wrote:
...instead of checking if it's less than SNDBUF_SMALL, because this isn't simply an optimisation to coalesce ACK segments: we rely on having enough data at once from the sender to make the buffer grow by means of TCP buffer size tuning implemented in the Linux kernel.
Use SNDBUF_BIG: above that, we don't need auto-tuning (even though it might happen). SNDBUF_SMALL is too... small.
Do you have an idea of how often sndbuf exceeds SNDBUF_BIG? I'm wondering if by making this change we might have largely eliminated the first branch in practice.
Before this series, or after 6/8 in this series, it happens quite often. It depends on the bandwidth * delay product of course, but at 1 Gbps and 20 ms RTT we get there in a couple of seconds.
Maybe 1 MiB would make more sense for typical conditions, but I'd defer this to a more adaptive implementation of the whole thing. I think it should also depend on the RTT, ideally.
Ok. Adding that context to the commit message might be useful.
While trying to add that context I came to two realisations: 1. in the past I used the 'netem' qdisc for convoluted things only, so much that I forgot how simple it can be for the task at hand: $ ./pasta --config-net -I moon0 -- sh -c '/sbin/tc q a dev moon0 root netem delay 1282ms; ping -c1 2600::' PING 2600:: (2600::) 56 data bytes 64 bytes from 2600::: icmp_seq=1 ttl=255 time=1298 ms 2. while there don't seem to be public iperf3 instances on the moon, there are indeed servers in Auckland and Tashkent ...hence v2, which implements the adaptive implementation I was referring to. It's rather simplistic and can/should be improved further but it's a big improvement on the existing situation. -- Stefano