Page MenuHomeVyOS Platform

Commit fails if ethernet interface doesn't support flow control (pause)
Open, Unbreak Now!PublicBUG

Description

I set up VyOS rolling from 2020-02-24 on a xcp-ng 8 host. When trying to assign an IP to an ethernet interface, it errors out when ethtool tries to set flow control settings (pause/etc). Looking at the source, there are a list of blacklisted drivers for pause, but xen_netfront is not on the list. After adding it, the interface comes up without issue.

Pull request:
https://github.com/vyos/vyos-1x/pull/266

Details

Difficulty level
Easy (less than an hour)
Version
VyOS 1.3-rolling-202003241825
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible

Event Timeline

pasik added a subscriber: pasik.Tue, Mar 24, 8:35 PM
pasik added a comment.Tue, Mar 24, 8:37 PM

does this also affect vyos 1.2, or only 1.3/rolling?

c-po added a subscriber: c-po.Tue, Mar 24, 8:38 PM

This fix is for 1.3 rolling only and should not be a problem on 1.2 as long as users do not explicitly set speed/duplex.

pasik added a comment.Tue, Mar 24, 8:43 PM

ok, thanks! I can test on xen aswell, when the fix is in 1.3/rolling.

jjakob added a subscriber: jjakob.EditedWed, Mar 25, 5:04 PM

Please add r8169 as well. The config failed to load at boot after upgrading to latest rolling because of this error. The script should check if the interface supports pause and silently continue if it doesn't, otherwise maintaining a list of all pause-unsupported interfaces is going to be next to impossible. I suspect a lot more of them don't.

In T2158#56542, @jjakob wrote:

Please add r8169 as well. The config failed to load at boot after upgrading to latest rolling because of this error. The script should check if the interface supports pause and silently continue if it doesn't, otherwise maintaining a list of all pause-unsupported interfaces is going to be next to impossible. I suspect a lot more of them don't.

Yeah - actually checking the results of ethtool (catching an error if any) would be ideal for sure, the driver blacklist method is kind of a kludge.

If we could get the driver blacklist committed quickly so that the released version will actually work for us hitting problems, and then handle the output for real as time allows, that would probably be ideal?

I suspect the driver blacklist won't be enough for a lot of users. A lot of very common ethernet cards don't support setting pause frames.

Oh, I see my changes were already pulled in, missed that!

I just did a quick validation; spun up a new vyos instance on xcp-ng with 1.3-rolling-202003241825, and validated that it failed. Re-installed with 1.3-rolling-202003250217, and it works.

So, as far as Xen is concerned, this is confirmed fixed. However, @jjakob is correct that this should be fixed to catch errors and handle them versus having to list each driver individually. Not sure if that should be a new bug or continue here?

I'll open a new task for it.

c-po added a subscriber: thomas-mangin.EditedWed, Mar 25, 6:59 PM

Please add r8169 as well. The config failed to load at boot after upgrading to latest rolling because of this error. The script should check if the interface supports pause and silently continue if it doesn't, otherwise maintaining a list of all pause-unsupported interfaces is going to be next to impossible. I suspect a lot more of them don't.

@thomas-mangin can resolve the issue as he introduced it :)

Unbootable systems are super bad

c-po triaged this task as Unbreak Now! priority.
thomas-mangin added a comment.EditedWed, Mar 25, 7:03 PM

we can fix it in two ways: undo the commit which check the return code of the program, hiding the issue and add a first BIG PRINT warning stage or find the root cause and fix it (harder).

It seems that some interface may not have pause feature and we have no correct detection for the feature when ethtool -a could be used ?

let me try to put a quick patch for you.

Maybe check the physical interface support via ethtool in the ethernet validate() function and raise a configerror if it doesn't? Or should the default be disabled and should a config command be enable-flow-control? The script that actually sets the flow control should definitely just print a warning to the syslog and not fail.

c-po added a comment.Wed, Mar 25, 7:19 PM

@jjakob I had that discussion with @thomas-mangin already - the best solution would be a dynamic probe via ethtool in verify()

I already hotfixed the issue on mine by adding r8169 into the unsupported list - but as said, that's not the real solution.

jjakob renamed this task from Need to add xen_netfront to interfaces that don't support pause to Commit fails if ethernet interface doesn't support flow control (pause).Thu, Mar 26, 8:49 AM

Hi jjakob, AFAICS the patch above should have fixed any problem with the ethtool. If not I will need to understand why.

@thomas-mangin Which commit do you mean, https://github.com/vyos/vyos-1x/commit/60d35d1d4d3a5acec6e39cccb166fd33490b6c27 ?
I can definitely say that did not fix the issue for r8169, the router failed boot after upgrading to 1.3-rolling-202003250217. If there were any patches after that, I can't see them.

I think this throws a exception that isn't caught: https://github.com/vyos/vyos-1x/blob/583e9d907236a4a98fe40e97a378c1fb655f8a95/python/vyos/ifconfig/ethernet.py#L114

root@vyos:~# /sbin/ethtool --show-pause eth0
Pause parameters for eth0:
Cannot get device pause settings: Operation not supported
root@vyos:~# echo $?
76

also I would remove L107-L109 and move the debug message to the exception handler of L114

Thanks - understood.