Page MenuHomeVyOS Platform

FRR config not loaded after daemons segfault or restart
Open, Requires assessmentPublicBUG

Description

Reproducing

set interfaces loopback lo address 10.1.1.1/32
set protocols ospf area 0 network 192.168.0.0/24
set protocols ospf default-information originate always
set protocols ospf default-information originate metric 10
set protocols ospf default-information originate metric-type 2
set protocols ospf log-adjacency-changes
set protocols ospf parameters router-id 10.1.1.1
set protocols ospf redistribute connected metric-type 2
set protocols ospf redistribute connected route-map CONNECT

set policy route-map CONNECT rule 10 action permit
set policy route-map CONNECT rule 10 match interface lo

And if daemons will be segfault or killed, then watchfrr will recover process, but without actual config

vyos@DHCP-Relay# sudo killall ospfd
[edit]
vyos@DHCP-Relay# ps ax | grep ospf
  876 ?        Ss     0:00 /usr/lib/frr/watchfrr -d zebra bgpd ripd ripngd ospfd ospf6d staticd
  942 ?        Ss     0:00 /usr/lib/frr/ospf6d -d --daemon -A ::1 -M snmp
 2481 ?        Ss     0:00 /usr/lib/frr/ospfd -d --daemon -A 127.0.0.1 -M snmp
 2486 ttyS0    S+     0:00 grep ospf
vyos@DHCP-Relay# vtysh -d ospfd -c "show run"
Building configuration...

Current configuration:
!
frr version 7.0.1-20190820-04-g047efd6
frr defaults traditional
hostname DHCP-Relay
log syslog informational
service integrated-vtysh-config
!
line vty
!
end

Details

Difficulty level
Unknown (require assessment)
Version
1.2.3
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

Dmitry created this task.Dec 20 2019, 10:09 PM
runar added a subscriber: runar.EditedDec 20 2019, 10:39 PM

This is a known fault, and is not easily fixable in the current implementation. This fault is because the vuos cli manually configures the frr process after it's started, and when the process dies/restarts it will read its config from the saved config file. This makes the process restart into an empty config as we have no way to save the config from the prior process.

As far as i know this is ment to be fixed by a cli rewrite.

pasik added a subscriber: pasik.Dec 27 2019, 9:55 AM
syncer assigned this task to zsdc.Jan 1 2020, 1:52 PM
syncer removed a project: VyOS 1.2 Crux.
maznu added a subscriber: maznu.Wed, Mar 25, 9:25 AM

We've seen this recently on bleeding-edge (yesterday's version) of 1.3. I'm currently investigating what tripped ospf6d, but I suspect it's going to be some Ubiquiti routers spewing their nasty OSPFv3 implementation.

Mar 24 21:23:31 coudreau ospf6d[1171]: SPF: Scheduled in 0 msec
Mar 24 21:23:31 coudreau ospf6d[1171]: SPF processing: # Areas: 1, SPF runtime: 0 sec 47 usec, Reason: R+, L+
Mar 24 21:23:31 coudreau ospf6d[1171]: SPF: Scheduled in 50 msec
Mar 24 21:23:31 coudreau ospf6d[1171]: SPF processing: # Areas: 1, SPF runtime: 0 sec 40 usec, Reason: N+
...snip...
Mar 24 21:23:34 coudreau watchfrr[1107]: [EC 268435457] ospf6d state -> down : read returned EOF
Mar 24 21:23:34 coudreau zebra[1142]: [EC 4043309121] Client 'ospf6' encountered an error and is shutting down.
Mar 24 21:23:34 coudreau zebra[1142]: client 51 disconnected. 7 ospf6 routes removed from the rib
Mar 24 21:23:39 coudreau watchfrr[1107]: [EC 100663303] Forked background command [pid 3764]: /usr/lib/frr/watchfrr.sh restart ospf6d
Mar 24 21:23:39 coudreau zebra[1142]: client 51 says hello and bids fair to announce only ospf6 routes vrf=0
Mar 24 21:23:40 coudreau watchfrr[1107]: ospf6d state -> up : connect succeeded

I am showing my naïvity about how VyOS' internals work now: would it not be possible to have FRR's daemons configured to use a configuration file in tmpfs, and have VyOS issue a "write mem" at the end of each time it interacts with FRR? That way FRR would have a persistent configuration in the event of a segfault or subprocess crash?

Merijn added a subscriber: Merijn.Wed, Mar 25, 9:42 AM

A router reboot last week reminded me to never to write mem in vtysh (but after looking it was automatic bij me :( )
The router booted with the configuration in FRR already loaded, and then Vyos tried to populate FRR based on the Vyos configuration and everything was broken :-)
It didn't help that the configuration i saved in FRR was a couple of months old.

maznu added a comment.Wed, Mar 25, 3:34 PM

I'm not expecting a persisted-across-reboots FRR config — hence suggesting tmpfs — so when the system boots there is nothing there. Obviously something would need to create the (empty) FRR config files in tmpfs before running FRR, otherwise I expect all the FRR daemons will fail to start.