24 hours later: See last comment. The MACs of bonded interfaces are not being set correctly when the machine is booted. Making any change to the bond applies the macs correctly, and everything starts working.
- Original Post ---
I realise this is an awfully broad swathe of changes, but as I couldn't find any older versions, I'm unable to bisect it any further.
The symptoms are that nothing is received by the device when two interfaces are up. Shutting down one of the interfaces restores traffic flow.
I've done a massive amount of debugging and I can not figure out what the problem is. The only change in the config is that in 201910 the bonding moved
vyos@mke-fw1# compare 6 [edit interfaces bonding bond0] +member { + interface eth0 + interface eth1 +} [edit interfaces ethernet eth0] -bond-group bond0 { -} [edit interfaces ethernet eth1] -bond-group bond0 { -} [edit] vyos@mke-fw1#
I've checked /proc/net/bonding/bond0 and there's no difference between a working system and a non working system, except for the ordering of the interfaces (eth0 is first on 1.2.0, eth1 is first on 1.2)
I can shut down eth1 on either the switch or the router and everything starts working.
It may be related to an accidental enabling of rp filtering, as I get a bunch of dmesg's about martians, so it appears that traffic IS being received, but being dropped somewhere.
The visibility is that vrrp ip addresses become MASTER, and ospf instances get stuck in Establishing. As soon as eth1 is turned off, everything starts working perfectly.
Here's the Cisco (NXOS) Config:
interface port-channel3 description fw1 switchport mode trunk no lacp graceful-convergence spanning-tree port type edge trunk speed 1000 vpc 3 interface Ethernet1/2 description fw1 switchport mode trunk spanning-tree port type edge trunk speed 1000 channel-group 3 mode passive
The switch sees LACPDUs from both interfaces, and brings up both interfaces, but as soon as it does, data flow goes crazy.
set interfaces bonding bond0 description 'po3' set interfaces bonding bond0 hash-policy 'layer2' set interfaces bonding bond0 member interface 'eth0' set interfaces bonding bond0 mode '802.3ad'
I have removed eth1 from there, for the moment.
Booting back to 1.2.0, everything comes up and works perfectly.
I'm stumped.