Page MenuHomeVyOS Platform

BGP Peer Group Scaling issues
Needs reporter action, HighPublicBUG

Description

Hi Team - Great work with VYOS BTW.

Yesterday, I came across this issue - which might not necessarily be a bug, but just a limitation. I could not find any related articles, so thought of bringing this up.

I configured 1000 BGP peers via a peer-group (passive-mode). I also had 10 standalone BGP peers (not configured via peer-group) which were working just fine. Out of of 1000 peers in the peer-group, I had only 3 active peers out of which one was sending 5 routes. BGP daemon config seems to be okay while creating 1000 peers. But when I restart BGP process or reboot the box or do a VRRP failover the daemon crashes. It's stuck at 100% CPU usage. Also, it doesn't seem to be spreading the load across all cores. Only one CPU is at 100% and all others are at 0% usage.

I thought I should bring it to your attention and see if this is fixable.

Thanks
Prem

Details

Difficulty level
Normal (likely a few hours)
Version
VyOS 1.3.0-20220120155620
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Infrastructure issue or change

Event Timeline

pjeevarathinam updated the task description. (Show Details)
pjeevarathinam updated the task description. (Show Details)

I tried this command as suggested - no luck.

sudo vtysh -c 'conf' -c 'router bgp YOUR_ASN_HERE' -c 'bgp listen limit 5000'

I was also suggested to try this -

You can also try changing /etc/frr/daemons and append --limit-fds 500 to the BGP daemon

No luck. It crashed BGP process

Provide some logs and examples of configuration.
Do you use SNMP?

dmbaturin added a subscriber: dmbaturin.

We need to check if it's still relevant and decide if it declare it WONTFIX.

@pjeevarathinam Could you re-check wiht 1.4-rc3 or the latest rolling?
You can play with descriptions

vyos@r4# set system frr descriptors 
Possible completions:
   <1024-8192>          Number of file descriptors
Viacheslav changed the task status from Open to Needs reporter action.Sun, Apr 7, 5:03 PM