Page MenuHomeVyOS Platform

telegraf do not start at boot when configured in VRF
Closed, InvalidPublicBUG

Description

I recently noticed that telegraf monitoring service does not start at boot time for some reason. Looking at boot log i see 5 quick start attempts in a second with error:

ip[2474]: Failed to load BPF prog: 'Operation not permitted'
systemd[1]: telegraf.service: Main process exited, code=exited, status=255/EXCEPTION
vyos-lns-1 systemd[1]: telegraf.service: Failed with result 'exit-code'.

and after that the service remains in failed state. Checking other services like ntp and snmpd i see the same error at first few attempts to start them. I can't tell what can be the reason, but i see that systemd configuration

Restart=always
RestartSec=10

fixes that problem for ntp and snmpd. So i'd suggest adding the same Restart/RestartSec to the file
/usr/share/vyos/templates/telegraf/override.conf.j2
Sample boot log attached.

Details

Difficulty level
Easy (less than an hour)
Version
1.4-rolling-202208291850
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Unspecified (please specify)

Event Timeline

I can't reproduce it

vyos@r14:~$ show conf com | match "vrf|tele"
set interfaces ethernet eth1 vrf 'mgmt'
set service monitoring telegraf influxdb authentication organization '[email protected]'
set service monitoring telegraf influxdb authentication token 'GuRJc12tIzfjnYdKRAIYbxdWd2aTpOT9PVYNddzDnFV4HkAcD7u7-kndTFXjGuXzJN6TTxmrvPODB4mnFcseDV=='
set service monitoring telegraf influxdb port '8086'
set service monitoring telegraf influxdb url 'https://foo.local'
set service monitoring telegraf prometheus-client
set service monitoring telegraf vrf 'mgmt'
set vrf name mgmt table '1010'
vyos@r14:~$

After reboot, the service telegraf works correctly

vyos@r14:~$ sudo systemctl status telegraf
● telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
     Loaded: loaded (/lib/systemd/system/telegraf.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/telegraf.service.d
             └─10-override.conf
     Active: active (running) since Thu 2022-10-13 15:24:23 EEST; 1min 19s ago
       Docs: https://github.com/influxdata/telegraf
   Main PID: 1868 (telegraf)
      Tasks: 10 (limit: 9404)
     Memory: 54.4M
        CPU: 2.650s
     CGroup: /system.slice/telegraf.service
             └─vrf
               └─mgmt
                 └─1868 /usr/bin/telegraf --config /run/telegraf/telegraf.conf --config-directory /etc/telegraf/telegraf.d --pidfile /run/telegraf/telegraf.pid
c-po assigned this task to Viacheslav.