Page MenuHomeVyOS Platform

High CPU usage by bgpd when snmp is active
Closed, ResolvedPublic

Description

In some scenarios when the routing table has a large number of routes, the snmpd process consumes 100% CPU, causing bgpd to stop, losing its settings and the host having to be restarted.

top - 19:06:24 up 23:50,  1 user,  load average: 0.23, 0.12, 0.04
Tasks: 142 total,   2 running,  99 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.7 us,  0.3 sy,  0.0 ni, 96.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.4 us,  0.7 sy,  0.0 ni, 96.6 id,  0.3 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu2  :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 98.7 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem:   4040552 total,  2620536 used,  1420016 free,   111856 buffers
KiB Swap:        0 total,        0 used,        0 free.   244692 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3963 snmp      20   0  173256 119956   4504 R 100.0  3.0   2:05.66 snmpd
 4008 root      20   0  104248  20036  14980 S   4.0  0.5  50:36.23 uacctd
 4003 root      20   0  119708  33596  23456 S   1.3  0.8  26:15.33 uacctd
   17 root      20   0       0      0      0 S   0.3  0.0   1:14.61 ksoftirqd/1

After testing snmpd configuration by removing the ipCidrRouteTable and inetCidrRouteTable modules, CPU usage was normalized.

SNMPDOPTS = '- LSed -u snmp -g snmp -I -ipCidrRouteTable, inetCidrRouteTable -p /run/snmpd.pid'

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Behavior change

Event Timeline

Unknown Object (User) added a subscriber: Unknown Object (User).Oct 8 2019, 2:49 PM

Any way of reproducing this with a simple snmpwalk? snmpget?

hagbard changed the task status from Open to Needs testing.Oct 10 2019, 3:07 PM
hagbard triaged this task as Normal priority.
hagbard added a project: VyOS 1.2 Crux.
hagbard moved this task from Need Triage to In Progress on the VyOS 1.2 Crux board.

I do not see this problem on a full table v4/v6 router with 2 cores 4 GB RAM. The question is why? Is removing the table a good idea? What was the state with 1.1.8?

There were multiple complains about bgpd crashes, memory issues inthe forum. They used the workaround removing the tables from snmpd successfully.

me@vyos:/tmp$ dpkg -l | grep vyos-1x
ii vyos-1x 1.3.0-16 all VyOS configuration scripts and data

me@vyos:/tmp$ sudo dpkg -i vyos-1x_1.3.0-16_all.deb
(Reading database ... 56555 files and directories currently installed.)
Preparing to unpack vyos-1x_1.3.0-16_all.deb ...
Unpacking vyos-1x (1.3.0-16) over (1.3.0-16) ...
dpkg: dependency problems prevent configuration of vyos-1x:
vyos-1x depends on accel-ppp; however:
Package accel-ppp is not installed.

dpkg: error processing package vyos-1x (--install):
dependency problems - leaving unconfigured
Processing triggers for systemd (215-17+deb8u13) ...
Errors were encountered while processing:
vyos-1x

me@vyos:/tmp$ sh version
Version: VyOS 1.2-rolling-201909242149

@fvbrasileiro Yeah, we found that out too today, we are working on a solution already. Please be patient.

No problem, I had already made the change manually in the snmp.py file. Since then, the problem has not occurred.

The change now leads to not starting SNMP when I do not use BGP

Oct 13 18:18:47 LR4 snmpd[5404]: getaddrinfo: inetCidrRouteTable Name or service not known
Oct 13 18:18:47 LR4 snmpd[5404]: getaddrinfo("inetCidrRouteTable", NULL, ...): Name or service not known
Oct 13 18:18:47 LR4 snmpd[5404]: Error opening specified endpoint "inetCidrRouteTable"
Oct 13 18:18:47 LR4 snmpd[5404]: Server Exiting with code 1
Oct 13 18:18:47 LR4 snmpd[5401]: Starting SNMP services::
Oct 13 18:18:47 LR4 systemd[1]: snmpd.service: control process exited, code=exited status=1
Oct 13 18:18:47 LR4 systemd[1]: Failed to start LSB: SNMP agents.
Oct 13 18:18:47 LR4 systemd[1]: Unit snmpd.service entered failed state.

Okay - just installed the latest rolling and it does not even boot up anymore. Its trapped in a snmp restart loop somehow. Going to revert this commit.

Can't create an iso right now to test it.

works with:

Version:          VyOS 1.2-rolling-201910110117
Built by:         [email protected]
Built on:         Fri 11 Oct 2019 01:17 UTC
Build UUID:       48a11fa6-8c59-4dbb-94a3-215376c09a02
Build Commit ID:  46f9b2ab60e4fa
hagbard changed the task status from Needs testing to Backport candidate.Oct 18 2019, 7:58 PM
hagbard moved this task from In Progress to Backlog on the VyOS 1.2 Crux board.
dmbaturin changed Is it a breaking change? from Unspecified (possibly destroys the router) to Behavior change.

Debian Buster changed a lot in VyOS 1.3. I tried my best to port the settings to VyOS 1.3 in T1921, can someone please verify it is working as expected?