Page MenuHomeVyOS Platform

Allow Interface MTU over 9000
Closed, ResolvedPublicFEATURE REQUEST

Description

I would like to open a discussion on this as it's likely to be a fairly complicated feature to do in what i believe to be the correct way.

I happen to need MTU's over 9000 on some interfaces, at the very least i would need 9050. Right now i am just manually overriding with ip link, but an actual solution is warranted i believe. I have switches that perform EBGP / hardware VXLAM vtep and they expect to be able to communicate with the host at it's native MTU. VXLANS add a 50 byte header to all packets, so to have 9000 MTU on a VXLAN interface, you need an MTU of at least 9050 on it's parent interface.

I'm sure there are other use cases for higher than 9000 MTU's, but beyond that size things get weird interface wise. There is no standard size for max MTU above 9000, some interfaces support 9.6K, 9.7K, 10K, 16K, etc... It all comes down to driver / hardware support.

Right now, interface MTU ranges are specified in XML files with fixed values. One solution is to just raise the cap to 9050 for this specific use case, but i believe the better solution is to grab the max / min MTU from the interface and use those values as caps.

an example of extracting max mtu from an interface...

ip -d link show dev <interface name> | sed -n -e 's/^.*maxmtu //p' | awk '{print $1}'

Considering these values are currently statically defined, adding this functionality would not be a simple fix. My initial idea is to scan the physical interfaces at boot, and store those values perhaps in /tmp where the configuration tool can grab them. Rebooting would keep these constantly updated with the current physical interfaces in the machine.

For non-physical interfaces such as dummy / vxlan you can keep the static assignment.

I would be interested in other peoples thoughts on how to address this?

Details

Difficulty level
Normal (likely a few hours)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Stricter validation

Event Timeline

SIN3R6Y updated the task description. (Show Details)

I see no issue with the proposed solution.

Need to add max MTU to operational mode and create a new validator using it and applying it to the xml. The only question being if the information is always available.

This was discussed already in T2404. The problem is that NICs that expose their min/max MTU are rare. None of the NICs I have expose it, neither through sysfs nor through 'ip -d link show'. If I recap the discussion from T2404, there are 2 main ways to solve this:
a) not have any limitations regarding MTU at all and then detect an error when trying to apply the new MTU. This means no way to verify if the new mtu is correct beforehand so it doesn't comply with the verify/apply separation that's prescribed in the developer docs. I described a possible workaround using revert code in T2404.
b) have a mtu detection script that would be ran by udev on every new NIC detection (to support hotplugging NICs) that would determine the min/max mtu with a bruteforce binary search algorythm (try to set a mtu and see if it errors), then record the results in some temporary file that would get read by the config script. The idea was proposed by @thomas-mangin.

a) not have any limitations regarding MTU at all and then detect an error when trying to apply the new MTU. This means no way to verify if the new mtu is correct beforehand so it doesn't comply with the verify/apply separation that's prescribed in the developer docs. I described a possible workaround using revert code in T2404.

My point of view on this would be, if a nic does not support this feature, then actually verifying the configuration before applying it is impossible. Right now we are just obeying arbitrary limits that are likely to be safe in most cases, i don't know if i would call that verification.

b) have a mtu detection script that would be ran by udev on every new NIC detection (to support hotplugging NICs) that would determine the min/max mtu with a bruteforce binary search algorythm (try to set a mtu and see if it errors), then record the results in some temporary file that would get read by the config script. The idea was proposed by @thomas-mangin.

Personally, if the driver does not return max/min mtu, i think a warning message upon setting mtu along the lines of...

"could not verify supported mtu range for interface <interface name>, make sure the set mtu is supported before commit"

would suffice just fine. If a driver isn't going to expose that information, it's a bit unreasonable to brute force it in my opinion. And on the contrary all of the nics i have expose this information.

@SIN3R6Y The solution of @SIN3R6Y is worth considering

I have a PR for this (not changing the XML limiting range) for review ATM.

https://github.com/vyos/vyos-1x/pull/473 was merged so now need to agree sane limits for the XML.

could have the range 68-65536 but it may be a bit on the extreme side.

could have the range 68-65536 but it may be a bit on the extreme side.

it might seem extreme now, but perhaps not later in time? As of today i know some intel nics support of to 16K, those are some of the highest i'm aware of.

Also, potentially a separate issue (if it hasn't been addressed) dummy interfaces (which you currently cant set an MTU on, but i plan to send a PR), have a max / min mtu of 0 (they have no limits). To me that pull request looks like it will break in that case. Probably need a case for handling no limits and just do 68-65K or similar.

There is the weird area here, as 1G interfaces are generally capped at 9K more or less (whether limits include those overheads or not is always weird, such as switches saying they are 9K but also 9120). For VM nics, you're never completely sure of what the host or what the switches directly connected to the hosts will allow either.

Maybe warn on over 9000 but not block it? Also, what are NVMeoF/RoCE NIC's saying these days? Still, since path MTU discovery isn't reliable, direct testing the interface seems like a good fallback, but while udev running it might be okay, again for VM nics, the host changing the underlying hardware could cause changes while running, so rescan on every commit?

There is the weird area here, as 1G interfaces are generally capped at 9K more or less (whether limits include those overheads or not is always weird, such as switches saying they are 9K but also 9120). For VM nics, you're never completely sure of what the host or what the switches directly connected to the hosts will allow either.

Maybe warn on over 9000 but not block it? Also, what are NVMeoF/RoCE NIC's saying these days? Still, since path MTU discovery isn't reliable, direct testing the interface seems like a good fallback, but while udev running it might be okay, again for VM nics, the host changing the underlying hardware could cause changes while running, so rescan on every commit?

VM nics don't usually report the host's supported mtu, so if the host mtu support changed while the vm is running, you would have no idea, and a rescan would tell you the same thing. In the case of virtio 68-65k always.

c-po changed Difficulty level from Unknown (require assessment) to Normal (likely a few hours).
This comment was removed by c-po.
vyos@vyos# set interfaces ethernet eth2 mtu 16000
[edit]
vyos@vyos# commit
[ interfaces ethernet eth2 ]
Interface MTU too high, maximum supported MTU is 9000!