Page MenuHomeVyOS Platform

interfaces bridge, bonding: revert back to per-interface membership syntax
Open, Requires assessmentPublic

Description

Ever since bridge and bond member syntax was migrated from being under each interface to being under the bridge and bond interfaces themselves, bridge and bond scripts have been in various states of brokenness with a lot of bugs cropping up, some are still not fixed as the required workarounds are ugly, complex, hacky or all of the above.

The root of the issue is that the operation of adding or removing a bridge or bond member is not in essence a bridge or bond operation, it is a operation on the interface that's getting added or removed. There are many checks and operations that need to be done on the interface itself before it's added and after it's removed.

This is how you add or remove a interface from a bridge or bond:

ip link set dev {if} master {brif|bondif}
ip link set dev {if} nomaster

whereas our config syntax places it under the bridge/bond interface, not under the enslaved interface.

With the membership information being stored under the bridge/bond config node, only the bridge/bond config scripts gets executed. But the enslaving or unenslaving is actually an operation on the enslaved interface, not on the bond, so now the bridge/bond code must operate on interfaces which aren't under its control.

It's not as simple as just running the above 2 commands to add or remove a bridge member. For example, each interface gets assigned a IPv6 link-local address that is added by default by the interface config script. But to be a bridge/bond member, the interface must not have any configured addresses. To do this, the bridge/bond script must cross-check the interface it wants to enslave for any configured addresses. These IPv6 link-local addresses are added by default and do not have a config node we could check that is equal on all interfaces, but some do have a 'ipv6 no-default-link-local' node that disables it. It wouldn't make sense to require the user to set 'ipv6 no-default-link-local' on all interfaces it wants to add to a bridge/bond.

  • there's no simple way to detect if a enslaved interface has a configured IPv6 link-local other than by manually parsing the configured addresses from the kernel
  • when adding members, all addresses must be purged
  • when removing members, their addresses are now purged so to re-add them requires re-running the removed interface's config script
  • bridge port STP settings are applied on the interface, not on the bridge

It would be simple, however, if the bridge/bond membership were moved back as it was under interface bridge-group/bond-group:

  • not adding a IPv6 link-local would be as simple as a if- check in the script on data it already has
  • when adding interfaces to bridges/bonds the script can do all validation in one place (make addresses and bridge/bond membership mutually exclusive)
  • as enslaving an interface changes its mac address, it can be simply set back to the configured one as we already have it under the config node

Script priorities would need to be rearranged so that the bridge/bond gets executed (created) before all other interfaces.

It would mean easy fixes for these issues: T2241 T2242 T2177 T2515
Plus other issues that were work-arounded due to this root issue could be de-workarounded.

Root tasks that caused the problems: T1614 T1556
5b836709a
f9654c1c9
a6fde4aa0

Details

Difficulty level
Unknown (require assessment)
Version
1.3-rolling-202006241940
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

jjakob created this task.Jun 25 2020, 1:40 PM
jjakob created this object in space S1 VyOS Public.
pasik added a subscriber: pasik.Jul 9 2020, 6:25 AM
c-po added a subscriber: c-po.Thu, Jul 23, 8:59 PM

Well I think its not necessary to change this back - we rather should extend the smoketests to detect those errors!