Page MenuHomeVyOS Platform

Bond member description change leads to network outage
Closed, ResolvedPublicBUG

Description

When a new bond interface is configured with LACP mode it inherits its MAC address from the first active member interface. When more interfaces are added VyOS automatically overrides their MAC address to the same MAC address that was previously set to that bond interface. In other words, at the end both bonding interface and all its members are configured with the same MAC address. This is required for bonding to work.
Unfortunately that automatic MAC address changes are not reflected in the config file in "hw-id". As result, any interface change (e.g. description) and commit could lead to MAC address change and connection lost. Only the interface that was configured as bond member as first is not affected by this misbehaviour (its MAC address is the same as bond's MAC address).

Proposed solution:
MAC address changes for bond member interfaces should be skipped/disabled.

Details

Difficulty level
Normal (likely a few hours)
Version
1.2 and 1.3
Why the issue appeared?
Implementation mistake
Is it a breaking change?
Perfectly compatible

Event Timeline

g.skupien added a project: VyOS 1.3 Equuleus.
runar added a subscriber: runar.May 18 2020, 9:30 PM

To clarify the hw-id tag. This is the only way VyOS scripts know what interface to give what name on bootup, as the boot-order of nics could be different on every reboot (potentially) vyos needs a way to identify the "correct" order of the nics when it boots. if you remove the hw-id tag from the interface the configuration script don't know what interface to give the configuration to, so you could potentially get nic-reordering on every single reboot.

The other use of this variable is to "reset" the mac address on the interface when the interface have had a custom mac address. The reason for this is that the kernel does not keep track on the burnt-in mac addresses on the nics, and the functionality for identifying burned-in mac's are optional to include in a nic.

So, the hw-id needs to be there, it should not be removed and this is not the fault of hw-id.
To fix this the configuration script for the interface needs to be inspected to find the fault and make it not to change the mac on the interface when it's in a bond.

I hope this clarify's the use of the hw-id parameter :)

pasik added a subscriber: pasik.May 19 2020, 12:04 PM

@runar, thanks for clarification! I will change the initial description accordingly.

g.skupien changed the task status from Open to Confirmed.May 19 2020, 12:19 PM
g.skupien updated the task description. (Show Details)
cpedro added a subscriber: cpedro.May 19 2020, 12:26 PM
jjakob added a subscriber: jjakob.EditedMay 21 2020, 8:40 AM

I think the way to do this is in src/conf-mode/interfaces-ethernet.py in apply(), don't change the interfaces mac if eth['is_bond_member'] is set.

g.skupien changed the task status from Confirmed to Backport candidate.EditedMay 25 2020, 9:48 AM

@jjakob, yes, thank you.
PR for 1.3: https://github.com/vyos/vyos-1x/pull/434
I will summit PR for 1.2 once 1.3 is released and tested.

c-po moved this task from Need Triage to VyOS 1.2.6 on the VyOS 1.2 Crux board.Jun 4 2020, 5:53 PM
c-po edited projects, added VyOS 1.2 Crux (VyOS 1.2.6); removed VyOS 1.2 Crux.
c-po moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus board.
dmbaturin closed this task as Resolved.Sun, Jul 26, 1:21 PM
dmbaturin moved this task from Needs Triage to Finished on the VyOS 1.2 Crux (VyOS 1.2.6) board.
dmbaturin changed Difficulty level from Unknown (require assessment) to Normal (likely a few hours).
dmbaturin changed Why the issue appeared? from Will be filled on close to Implementation mistake.
dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.