Page MenuHomeVyOS Platform

Unconfigured Ethernet interface discovery partial failure on boot
Needs testing, NormalPublicBUG

Description

When installing VyOS 1.2 (current nightly build) as a fresh install on either physical hardware or in a VM with multiple interfaces the system doesn't correctly detect all interfaces and add them to the config. Manually adding interfaces and their hw-id resolves the issue. Upon doing a show interfaces, interfaces missing their hw-id show up as Administratively down.

Still need to figure out the cause.

Details

Difficulty level
Normal (likely a few hours)
Version
1.2
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

syncer triaged this task as Normal priority.Mar 7 2018, 10:32 PM

A bit more information on this.

It looks like this might be similar or related to T290 . At least it provided me some hints as to what might be happening.

If you start with an empty config.boot each boot will find exactly one new Ethernet interface and add it to the config.boot. In other words if you have 4 interfaces you will need to reboot 3 times for them to be discovered and added to the config.

Digging into it a bit more it looks like the /lib/udev/vyatta_net_name script is called in parallel for each new interface and the logic to obtain an exclusive lock on /run/udev/.vyatta-lock is no longer viable (looking it over I'm not actually sure why it was in there to begin with).

As a quick test commenting out the exclusive lock in sub lock_file seems to have fixed the issue.

If there is a good reason for the lock file perhaps making it generate an interface-specific lock file like .vyatta-lock-eth0 would be an easy fix. We would likely want to update file_unlock to also unlink when done to clean up though.

syncer changed the task status from Open to On hold.Oct 13 2018, 6:28 PM
syncer added a subscriber: syncer.

please retest

syncer changed the task status from On hold to Needs testing.Feb 7 2019, 11:58 PM
syncer reassigned this task from rps to zsdc.
syncer lowered the priority of this task from Normal to Low.
zsdc changed the task status from Needs testing to Confirmed.Feb 25 2019, 5:31 PM
zsdc reassigned this task from zsdc to dmbaturin.
zsdc added a subscriber: zsdc.

Confirmed in 1.2.0-rolling+201902250337. A hw-id is assigned to only one of interfaces per boot.
If this will not lead to a race condition situation, maybe try to implement the solution from T577#12637 proposed by @rps?

rps raised the priority of this task from Low to Requires assessment.Dec 11 2019, 12:45 AM

There is certainly a race condition in interface renaming for VyOS 1.2. This is possibly also an issue for 1.3 but more testing is required before I'm sure of that.

What makes it frustrating is that the nature of it being a race condition produces different results for each boot. Many times it will boot correctly and seem fine but a reboot with no change in configuration will result in a missing interface or a administratively down interface.

Also note that I have been unable to re-create the issue in VirtualBox as one of the tools used to determine interfaces is biosdevname which always provides an empty response as a VM for me so troubleshooting this likely requires a bare metal install.

Summary:

Problem 1: Interface MAC addresses get re-written and assigned to incorrect interfaces
Problem 2: Interfaces are stuck in a partially renamed state and do not appear in the interface list
Problem 3: All unconfigured interfaces are not always discovered on boot.

The following logic in vyatta-interfaces.pl seems to be the most likely culprit in terms of MAC changes, in sub update_mac:

First it attempts:
sudo ip link set $name address $mac

Upon failure of that, it re-attempts:

sudo ip link set $name down
sudo ip link set $name address $mac
sudo ip link set $name up

This explains the behavior I'm seeing of a MAC of one interface being incorrectly assigned to another, interfaces missing stuck in a renameX transition state, or interfaces correctly renamed but administratively down due to a duplicate MAC.

Looking at systemd logging I'm seeing interface rename events intermixed with FRR routing services making use of interfaces.

I believe there are a few operations that might be in contention (but this is speculation):

  1. Rename of network interfaces by systemd-udev
  2. Rename of network interfaces upon loading config.boot via vyatta-interfaces.pl
  3. Services (FRR) attempting to enable interfaces on startup at the same time as 1 or 2

Questions:

Is there a way to ensure that interface rename, or configuration operations occur before FRR is started?

How and where is the logic to change a MAC address used (e.g. sub update_mac). Would an exception for physical interfaces be appropriate (e.g. never try to change the MAC of a physical interface). It seems that the current model requires the hw-id of a physical interface to match the name of a configured interface so while changing the MAC address of an interface is possible, I'm not sure it's a valid operation in the context of how VyOS currently manages interfaces.

Is there an easy way to override vyatta-interfaces.pl with a modified version for testing, or does this require building a new disk image? Along with disabling some update_mac functionality I would be especially interested in some logging indicating how often it's being called during the boot process.

Are there any other parts of the configuration management system which would be modifying the administrative state or MAC address of an interface we should be looking?

Test Case:

The problem is observed on a Supermicro A1SRi-2758F build (Atom 8-core C2758) with 4 x on-board interfaces using igb driver and a PCIe card with 2 x 10GbE interfaces using the ixgbe driver.

The latest round of testing is using VyOS 1.2.4-EPA

I can provide additional debugging or make a unit available remotely for testing if that would be helpful.

Until this resolved the hardware platform remains unusable with VyOS 1.2 and is stuck on 1.1.8.

This issue is possibly fixed in current by ticket T1970, could you retry with the newest current rolling release?

I ran into this today after upgrading to latest 1.3 rolling image. All interfaces were added and appeared to have the correct macs (the output of ip link matched what was in the config), but the physical interfaces to which they corresponded weren't right. I found this by looking at the link state of each interface and saw that two if them were swapped. The interface that should be eth2 was physically eth4 and vice versa, but the macs it was showing in ip link was wrong for that physical card, as if it were set to the other interface's mac erroneously.
I got the cards to detect properly after 2 reboots.

I can take this on in conjunction with T1499 since there has been no activity on it for a long time and the issue still very much exists.

dmbaturin set Is it a breaking change? to Unspecified (possibly destroys the router).
dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.Sep 3 2021, 7:34 AM
dmbaturin set Issue type to Bug (incorrect behavior).

@dmbaturin @c-po can you see this and tell me if it's still current

Viacheslav changed the task status from Confirmed to Needs testing.Jan 20 2024, 10:21 AM
Viacheslav triaged this task as Normal priority.