Page MenuHomeVyOS Platform

epa2 BGP peers initiate before config is fully loaded, routes leak.
Closed, ResolvedPublicBUG

Description

I am struggling with tripping max-prefixes with BGP peers when rebooting. This happens both when the peers are active and also when they are shutdown in the configuration. I'm theorizing that FRR is establishing the connections before the whole configuration has been loaded.

One example (the peer is shutdown):

neighbor 91.xx.yy.69 {
     address-family {
         ipv4-unicast {
             prefix-list {
                 export ASLOCAL
             }
             soft-reconfiguration {
                 inbound
             }
         }
     }
     description ispx
     remote-as 65500
     shutdown
 }

After rebooting the router, notice how the peer has MsgRcvd and MsgSent even though it was shutdown in the configuration during the reboot:

Neighbor                 V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
91.xx.yy.69            4       65500 4695    2032        0    0    0 00:06:02 Idle (Admin)

Excerpt from frr.log

Jan  2 17:50:07 vyos bgpd[1074]: %ADJCHANGE: neighbor 91.xx.yy.69(Unknown) in vrf default Up
Jan  2 17:50:09 vyos bgpd[1074]: %ADJCHANGE: neighbor 91.xx.yy.69(Unknown) in vrf default Down Peer closed the session

Details

Difficulty level
Unknown (require assessment)
Version
1.2.0-epa2
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

IPv6 seems to have the same issue. Peer shutdown in configuration, reboot, results below:

Neighbor                 V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
2001:xx:xx:xx::300:1    4      65500       1       4        0    0    0    never Idle (Admin)

This behavior was already present in the old Quagga implementation in Vyos 1.1.7.
As a workaround we always shutdown the peers when doing a planned reboot.

syncer triaged this task as High priority.
syncer edited projects, added VyOS 1.2 Crux (VyOS 1.2.0-EPA3); removed VyOS 1.2 Crux.

The config validation issue also seems to cause issues with route-maps applied - running config shows route maps applied but not configured inside FRR as can be seen with vtysh.

syncer changed the task status from Open to Needs testing.Jan 27 2019, 4:18 AM
syncer reassigned this task from dmbaturin to zsdc.
syncer added a subscriber: dmbaturin.

Still seems to be present in VyOS 1.2.0-GA...

Hello @danhusan!
How big is your configuration at all? Can you provide depersonalized config? Which hardware or virtual machine using for VyOS? Can you provide full log of booting?
We can't confidently reproduce this bug. Looks like configuration can't load quickly enough or something like this.

@bbabich can you open separate issue with your validation problem?

Hi, @zsdc.

Small config, just 4 interfaces with IPv4 and IPv6 + some BGP config. I am running the VyOS instance in ESXi with some fairly modern hardware.
Unfortunately I cannot just reboot this device at will. If you provide your email I can send over the config.

Either way - I'd say this is not worth exploring at this moment if a rewrite of the vyatta-bgp.pl is planned to stop interfacing directly towards vtysh.

@Merijn can you add something?

Hi @danhusan,

We are seeing this issue mostly on BGP routers with Internet Exchange connections because at a reboot we are hitting max-prefix limits with a lot of peers.
At this moment it is not possible to upgrade to latest 1.2.0, still running 1.1.8.

It should be possible to reproduce this, i can try this in test setup.

@danhusan , you can send the configuration to [email protected] with the theme "Phabricator T1148". Also, please check if a remote side of BGP peering run in active or passive mode?

@Merijn , we were checking with 1.2.0-epa2, according to information in the ticket. Unfortunately, now we can't support versions prior to 1.2.0, so there is no sense to test with 1.1.8.

@danhusan IPv6 should not be affected. Workaround for IPv4:

  1. Make sure all your peers that are supposed to advertise IPv4 routes have "address-family ipv4-unicast" set
  2. Do "set protocols bgp $yourASN parameters default no-ipv4-unicast" and commit.

Now the longer version. This is the sequence of commands that is run on creating a peer (produced with debug prints from the script):

/usr/bin/vtysh -c configure terminal -c router bgp 64517 -c neighbor 192.0.2.100 remote-as 64513
/usr/bin/vtysh -c configure terminal -c router bgp 64517 -c address-family ipv4 unicast -c neighbor 192.0.2.100 route-map Foo out
/usr/bin/vtysh -c configure terminal -c router bgp 64517 -c address-family ipv4 unicast -c neighbor 192.0.2.100 activate
/usr/bin/vtysh -c configure terminal -c router bgp 64517

This ordering is good, when peers are not activated in the addredd-family by default. For IPv6 this is the case by default. For IPv4 is not, but "default no-ipv4-unicast" corrects it.

I remember I've been tempted to make it the default in 1.2.0, but the problem is that peers that have no address-family set will stop advertising anything at all.

Since FRR lacks a command for creating peers in down state, no other workaround is possible.

I suggest that in 1.3.0 we should make it the default. The rewritten script can either require that at least one AF is set for every peer, or automatically inject the activate command for peers with IPv4 addresses if a certain opposite option is set.

@danhusan IPv6 should not be affected. Workaround for IPv4:

Spot on as always @dmbaturin. I can confirm the workaround works in my env.

I suggest that in 1.3.0 we should make it the default. The rewritten script can either require that at least one AF is set for every peer, or automatically inject the activate command for peers with IPv4 addresses if a certain opposite option is set.

Sounds like a good idea.

The solution was tested and fully worked.

erkin set Is it a breaking change? to Unspecified (possibly destroys the router).Aug 31 2021, 6:59 PM
erkin set Issue type to Bug (incorrect behavior).