On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano Brivio wrote:On Wed, 19 Feb 2025 14:14:29 +1100 David Gibson <david(a)gibson.dropbear.id.au> wrote:Ah, thanks, I hadn't realised that newer kernels had better named constants. When I respin I'll use matching names.Currently we reject the -m option if given a value less than ETH_MAX_MTUETH_MIN_MTU(68). That define is derived from the kernel, but its name is misleading: it doesn't really have anything to do with Ethernet per se, but is rather the minimum payload any L2 link must be able to handle in order to carry IPv4.Yes, that should be IPV4_MIN_MTU instead, but it was only added as recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you pointed out, but safe.Ah... yes. I was thinking that that requirement implied that a link which can't fragment was useless if it couldn't carry 576-byte datagrams, but thinking over your examples here I realise I was mistaken.For IPv6, it's not sufficient: that requires an MTU of at least 1280. Furthermore, the value of 68 is the minimum IP *fragment* size the link must be able to carry. Since we don't support IP fragmentation, it's not sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the minimum IP datagram size that all hosts must be able to accept.First off, the only assumption in RFC 791 terms we can _perhaps_ make is that we are some kind of "module" (also called "node", could be host or router), not a (full) host. Maybe not even a module. So, with that regard, we don't need to be prepared to _accept_ (for ourselves as destination) any particular datagram size. Second, even if all hosts need to be able to accept 576-byte datagrams, that doesn't mean that all links need to be able to carry them. The MTU refers _to the link_, not to what a host is able to accept.And that's the reason why you can set 68 bytes as MTU on most network interfaces on Linux. We set sub-576 values ourselves in tests: $ grep -rn "mtu 256" * passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256 passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256 That is, indeed, all hosts (not "modules") need to be able to accept (not "forward") datagram sizes of at least 576 bytes... but that's only assuming you can deliver those datagrams to them. This is not just a theoretical matter. As late as 2018, I was made aware of a setup with several (local!) nodes with links between them having ~380 bytes as MTU. Sure enough, the reason why I know about this was an issue coming from the same flawed assumption made in kernel commit c9fefa08190f ("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by 82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check in ip6_tnl_xmit"). See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too") on the subject of what links can carry vs. what endpoints should be able to forward.Sorry, I'm not following what change you're suggesting (or discussing?).Move the verification of the MTU's lower bound to logic specific to the IP versions and correct those errors. Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au> --- conf.c | 20 +++++++++++++++----- ip.h | 7 +++++++ util.h | 3 --- 3 files changed, 22 insertions(+), 8 deletions(-) diff --git a/conf.c b/conf.c index c5ee07b0..e127acc1 100644 --- a/conf.c +++ b/conf.c @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv) if (errno || *e) die("Invalid MTU: %s", optarg); - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) { - die("MTU %lu out of range (%u..%u)", mtu, - ETH_MIN_MTU, ETH_MAX_MTU); + if (mtu > ETH_MAX_MTU) { + die("MTU %lu too large (max %u)", + mtu, ETH_MAX_MTU); } c->mtu = mtu; @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv) log_conf_parsed = true; /* Stop printing everything */ nl_sock_init(c, false); - if (!v6_only) + if (!v6_only) { + if (c->mtu < IPV4_MINMAX_DATAGRAM) {Now, if you want to make this symmetric with the IPv6 case, we could also move this here... it just unnecessarily adds lines of code, and this function is already (necessarily) rather long.warn() and disable the relevant protocol. That makes sense, I'll make that change.+ die("MTU %"PRIu16" is too small for IPv4 (minimum %u)", + c->mtu, IPV4_MINMAX_DATAGRAM); + } c->ifi4 = conf_ip4(ifi4, &c->ip4); - if (!v4_only) + } + if (!v4_only) { + if (c->mtu < IPV6_MIN_MTU) { + die("MTU %"PRIu16" is too small for IPv6 (minimum %u)", + c->mtu, IPV6_MIN_MTU);Does the fact that we don't disable IPv6 imply that IPv6 must be working at all times? In my opinion not. It's also rather convenient to be able to specify '-m 200' (for whatever test) without having to give '-4' explicitly.From a functionality perspective, I think warn() would be a betterchoice.-- David Gibson (he or they) | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you, not the other way | around. http://www.ozlabs.org/~dgibson+ } c->ifi6 = conf_ip6(ifi6, &c->ip6); + } if ((*c->ip4.ifname_out && !c->ifi4) || (*c->ip6.ifname_out && !c->ifi6)) die("External interface not usable"); diff --git a/ip.h b/ip.h index 1544dbf4..8f5262fa 100644 --- a/ip.h +++ b/ip.h @@ -104,4 +104,11 @@ static const struct in6_addr in6addr_ll_all_nodes = { /* IPv4 Limited Broadcast (RFC 919, Section 7), 255.255.255.255 */ static const struct in_addr in4addr_broadcast = { 0xffffffff }; +/* Minimum IP datagram size all hosts must be prepared to accept (RFC 791) */ +#define IPV4_MINMAX_DATAGRAM 576 + +#ifndef IPV6_MIN_MTU +#define IPV6_MIN_MTU 1280 +#endif + #endif /* IP_H */ diff --git a/util.h b/util.h index 50e96d32..bdca5ee6 100644 --- a/util.h +++ b/util.h @@ -34,9 +34,6 @@ #ifndef ETH_MAX_MTU #define ETH_MAX_MTU USHRT_MAX #endif -#ifndef ETH_MIN_MTU -#define ETH_MIN_MTU 68 -#endifI would be fine with using IPV4_MIN_MTU by the way and adding a fallback for it (there's no such thing as IP_MIN_MTU).#ifndef IP_MAX_MTU #define IP_MAX_MTU USHRT_MAX #endif