On Thu, 20 Feb 2025 14:55:30 +1100
David Gibson <david(a)gibson.dropbear.id.au> wrote:
On Wed, Feb 19, 2025 at 06:37:28AM +0100, Stefano
Brivio wrote:
On Wed, 19 Feb 2025 14:14:29 +1100
David Gibson <david(a)gibson.dropbear.id.au> wrote:
> Currently we reject the -m option if given a value less than ETH_MAX_MTU
ETH_MIN_MTU
> (68). That define is derived from the kernel, but its name is misleading:
> it doesn't really have anything to do with Ethernet per se, but is rather
> the minimum payload any L2 link must be able to handle in order to carry
> IPv4.
Yes, that should be IPV4_MIN_MTU instead, but it was only added as
recently as 4.14 kernels, so I opted for ETH_MIN_MTU. A misnomer as you
pointed out, but safe.
Ah, thanks, I hadn't realised that newer kernels had better named
constants. When I respin I'll use matching names.
> For IPv6, it's not sufficient: that
requires an MTU of at least
> 1280.
>
> Furthermore, the value of 68 is the minimum IP *fragment* size the link
> must be able to carry. Since we don't support IP fragmentation, it's not
> sufficient for us. Instead we should clamp the MTU to 576 for IPv4 - the
> minimum IP datagram size that all hosts must be able to accept.
First off, the only assumption in RFC 791 terms we can _perhaps_ make is
that we are some kind of "module" (also called "node", could be host
or
router), not a (full) host. Maybe not even a module. So, with that
regard, we don't need to be prepared to _accept_ (for ourselves as
destination) any particular datagram size.
Second, even if all hosts need to be able to accept 576-byte datagrams,
that doesn't mean that all links need to be able to carry them. The MTU
refers _to the link_, not to what a host is able to accept.
Ah... yes. I was thinking that that requirement implied that a link
which can't fragment was useless if it couldn't carry 576-byte
datagrams, but thinking over your examples here I realise I was
mistaken.
And that's the reason why you can set 68
bytes as MTU on most network
interfaces on Linux. We set sub-576 values ourselves in tests:
$ grep -rn "mtu 256" *
passt_tcp:95:guest ip link set dev __IFNAME__ mtu 256
passt_vu_tcp:95:guest ip link set dev __IFNAME__ mtu 256
That is, indeed, all hosts (not "modules") need to be able to accept
(not "forward") datagram sizes of at least 576 bytes... but that's only
assuming you can deliver those datagrams to them.
This is not just a theoretical matter. As late as 2018, I was made
aware of a setup with several (local!) nodes with links between them
having ~380 bytes as MTU.
Sure enough, the reason why I know about this was an issue coming from
the same flawed assumption made in kernel commit c9fefa08190f
("ip6_tunnel: get the min mtu properly in ip6_tnl_xmit"), and fixed by
82a40777de12 ("ip6_tunnel: use the right value for ipv4 min mtu check
in ip6_tnl_xmit").
See also commit b4331a681822 ("vti6: Change minimum MTU to IPV4_MIN_MTU,
vti6 can carry IPv4 too") on the subject of what links can carry vs.
what endpoints should be able to forward.
> Move the verification of the MTU's lower bound to logic specific to the IP
> versions and correct those errors.
>
> Signed-off-by: David Gibson <david(a)gibson.dropbear.id.au>
> ---
> conf.c | 20 +++++++++++++++-----
> ip.h | 7 +++++++
> util.h | 3 ---
> 3 files changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/conf.c b/conf.c
> index c5ee07b0..e127acc1 100644
> --- a/conf.c
> +++ b/conf.c
> @@ -1663,9 +1663,9 @@ void conf(struct ctx *c, int argc, char **argv)
> if (errno || *e)
> die("Invalid MTU: %s", optarg);
>
> - if (mtu && (mtu < ETH_MIN_MTU || mtu > ETH_MAX_MTU)) {
> - die("MTU %lu out of range (%u..%u)", mtu,
> - ETH_MIN_MTU, ETH_MAX_MTU);
> + if (mtu > ETH_MAX_MTU) {
> + die("MTU %lu too large (max %u)",
> + mtu, ETH_MAX_MTU);
> }
>
> c->mtu = mtu;
> @@ -1838,10 +1838,20 @@ void conf(struct ctx *c, int argc, char **argv)
> log_conf_parsed = true; /* Stop printing everything */
>
> nl_sock_init(c, false);
> - if (!v6_only)
> + if (!v6_only) {
> + if (c->mtu < IPV4_MINMAX_DATAGRAM) {
Now, if you want to make this symmetric with the IPv6 case, we could
also move this here... it just unnecessarily adds lines of code, and
this function is already (necessarily) rather long.
Sorry, I'm not following what change you're suggesting (or discussing?).
The exact change I quoted: moving the check on the minimum MTU to here:
if (c->mtu < IPV4_MINMAX_DATAGRAM) {
compared to doing it earlier in conf().