Page MenuHomePhabricator

Add ability to restart frr processes
Open, Requires assessmentPublic

Description

For now if a frr process restarts there is no way to restore it's configuration to the previous state.

This is possible to short-term fix by creating a post-commit script that saves the frr configuration to file after a change in frr.
Also on frr process reload it needs to rewind the config into a running starte on process startup.

All places where frr is changed need to be identified to allow saving of config on eg. dhcp static route change.

This is not supposed to be a long-term fix, because it can cause config inconsistency and a better solution is to rewrite all frr setup script to allow for a vyos->frr sync check. (but that will take much more time to implement)

For now the process is not started, feel free to comment and if your'e want a mission, just assign yourself the case.

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close

Related Objects

Event Timeline

runar created this task.Jul 9 2019, 6:42 AM
runar created this object in space S1 VyOS Public.
runar renamed this task from Add ability to reboot frr processes to Add ability to restart frr processes.Jul 9 2019, 7:17 AM
runar updated the task description. (Show Details)
maznu added a subscriber: maznu.Sep 23 2019, 3:14 PM

Having had bgpd peg a core to 100% (for no discernible reason), I'd welcome the ability to give quag^WFRR a kick, rather than rebooting the entire VyOS box.

maznu added a comment.Sep 29 2019, 3:40 AM

…or, indeed, it'd be great to be able to restart FRR and have it get a new config when this happened just now:

Sep 29 02:53:00 bly bgpd[1234]: [EC 100663313] SLOW THREAD: task bgpd_sync_callback (7fdb4984ed60) ran for 50172ms (cpu time 48285ms)
Sep 29 02:53:42 bly watchfrr[1167]: [EC 268435457] bgpd state -> unresponsive : no response yet to ping sent 90 seconds ago
Sep 29 02:53:42 bly watchfrr[1167]: [EC 100663303] Forked background command [pid 1284]: /usr/lib/frr/watchfrr.sh restart bgpd
Sep 29 02:53:48 bly bgpd[1234]: [EC 100663313] SLOW THREAD: task bgpd_sync_callback (7fdb4984ed60) ran for 48218ms (cpu time 46469ms)
Sep 29 02:53:48 bly bgpd[1234]: Terminating on signal
Sep 29 02:54:02 bly zebra[1230]: [EC 4043309117] Client 'vnc' encountered an error and is shutting down.
Sep 29 02:54:02 bly zebra[1230]: [EC 4043309117] Client 'bgp' encountered an error and is shutting down.
Sep 29 02:54:02 bly watchfrr[1167]: [EC 268435457] bgpd state -> down : unexpected read error: Connection reset by peer
Sep 29 02:54:02 bly zebra[1230]: client 30 disconnected. 0 vnc routes removed from the rib
Sep 29 02:54:02 bly zebra[1230]: zebra/zebra_ptm.c:1345 failed to find process pid registration
Sep 29 02:54:02 bly watchfrr[1167]: bgpd state -> up : connect succeeded

Looks like bgpd crashed (or was killed by watchfrr) and now all I've got is:

% BGP instance not found
pasik added a subscriber: pasik.Sep 29 2019, 11:23 AM
s.lorente added subscribers: Dmitry, s.lorente.EditedOct 3 2019, 1:34 AM

We have had ticket ID 481: How to restart OSPF?

As @Dmitry suggested, it would be nice to have a comand like restart protocol ospf, as well as for the other routing protocols.

sudo killall ospfd does not work, the process is restarted but VyOS stops using OSPF.

vyos@vyos:~$ sh ip ospf
% OSPF instance not found
vyos@vyos:~$

sudo systemctl restart frr does not work either.

We suggested the following workaround:

  1. show protocols ospf | commands
  2. edit the output;
  3. delete protocols ospf
  4. commit
  5. edit protocols ospf and insert the commands from the 2nd step
  6. commit

But user said that, because of bug 518: Removing full OSPF protocol is not possible, the only stable solution is completely rebooting the router.

I was asking at FRR slack channel. First they suggested systemctl reload frr or sudo systemctl restart frr but they could not see a reason why I would need to use them. They say in recent versions there is no need to restart after config changes. I gave them the situation: You have an active OSPF neighbor and you want to change your router-id.
Then we checked the contents of /etc/frr/daemons, did ps -ef | grep frr, vtysh -c "show zebra client summary" and vtysh -c "show run" | grep "router ospf" -

. They concluded it must be a VyOS thing.

When looking for a solution, it'd be nice to take the following related tasks into account:

https://phabricator.vyos.net/T1304
https://phabricator.vyos.net/R4:0760bf300a5a425e5cc147a7b58d27375137c2e2
https://phabricator.vyos.net/R3:d55d3bfb99a41687dedd983a2e752a65526830e0