Page MenuHomeVyOS Platform

segfault of configuration lead to disaster in router
Closed, ResolvedPublicBUG

Description

Hi,

I'm trying with VyOS 1.2.4 and found a problem about segfault of configuration, I know this is known issue occasionally happened especially when two sessions tried to configure or commit at same time. But this time, I found that the segfault of configuration could lead to problem almost called disaster.

what did I do?
Basically, I wrote scripts to change weight of bgp neighbors according to different conditions for archiving purpose that traffic could choose better routes intelligently. Normally, commands like below:

/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper begin 
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper set protocols bgp 65532 neighbor 192.168.3.1 address-family ipv4-unicast weight 30000
/opt/vyatta/sbin/vyatta-cfg-cmd-wrapper commit

But in some complex conditions, script will configure two neighbors at same time, under this condition, not every time, but in high possibility, the segfault will happen and lead to disaster-like consequence.

what did I see?
when the segfault happen, I can see the following consequences:

  • configuration failure with segfault
Jan 16 12:00:58 aer-124-test kernel: [ 6821.347866] my_commit[26468]: segfault at 8 ip 00007faa34fcce05 sp 00007ffd4ad5bf10 error 6 in libperl.so.5.20.2[7faa34eee000+1b8000]
Jan 16 12:00:58 aer-124-test kernel: [ 6821.347877] Code: 89 ef e8 ee 31 f5 ff 44 09 63 0c 41 81 e4 00 00 08 00 74 1a 48 8b 45 50 48 83 c0 01 48 3b 45 60 7d 3b 48 8b 55 48 48 89 45 50 <48> 89 1c c2 48 83 c4 10 48 89 d8 5b 5d 41 5c c3 0f 1f 00 48 89 54
Jan 16 12:00:59 aer-124-test kernel: [ 6821.491028] my_commit[26269]: segfault at 8 ip 00007f5ee43f2e05 sp 00007ffe04ffa380 error 6 in libperl.so.5.20.2[7f5ee4314000+1b8000]
Jan 16 12:00:59 aer-124-test kernel: [ 6821.491039] Code: 89 ef e8 ee 31 f5 ff 44 09 63 0c 41 81 e4 00 00 08 00 74 1a 48 8b 45 50 48 83 c0 01 48 3b 45 60 7d 3b 48 8b 55 48 48 89 45 50 <48> 89 1c c2 48 83 c4 10 48 89 d8 5b 5d 41 5c c3 0f 1f 00 48 89 54
Jan 16 12:00:59 aer-124-test kernel: [ 6821.516564] my_discard[26469]: segfault at 8 ip 00007efd850ece05 sp 00007ffc76235d70 error 6 in libperl.so.5.20.2[7efd8500e000+1b8000]
Jan 16 12:00:59 aer-124-test kernel: [ 6821.516578] Code: 89 ef e8 ee 31 f5 ff 44 09 63 0c 41 81 e4 00 00 08 00 74 1a 48 8b 45 50 48 83 c0 01 48 3b 45 60 7d 3b 48 8b 55 48 48 89 45 50 <48> 89 1c c2 48 83 c4 10 48 89 d8 5b 5d 41 5c c3 0f 1f 00 48 89 54
  • all interfaces include ethernet interface and tunnel interfaces or vpn interfaces are shutdown or deleted
[Thu Jan 16 12:03:09 2020] e1000e: eth3 NIC Link is Down
[Thu Jan 16 12:03:11 2020] e1000e: eth0 NIC Link is Down
[Thu Jan 16 12:03:12 2020] e1000e: eth1 NIC Link is Down
[Thu Jan 16 12:03:12 2020] e1000e: eth2 NIC Link is Down
  • many services are stopped including ssh, ntp, rsyslog etc. Some services are restarted like charon
  • iptables are empty, configured nat and firewall rules are gone
  • configured hostname and DNS are gone
  • But, show configuration still can show all configuration

I uploaded all related log file in this post including messges, kernel log , vyatta log and frr log.

what do I expect?
Of course I can control my script starting configuration session once a time to avoid the segfault as much as possible. But firstly, this may be not only reason could cause the segfault, secondly, it still need to investigate root cause in case it lead to the worst case.
I went through the issue list here regard to segfault and understand this is inevitable issue under current CLI implementation. But my point is that we should diminish the impact of segfault to the smallest at least. The consequence like what I met is unacceptable, this is really disaster if you met it in production. So anyone can help to look into this problem? I'm willing to help, but I have no idea about relationship between this segfault and the consequence it lead to, if anyone can guide me or give me hint about it, I could do investigation.

best regards.

Details

Difficulty level
Unknown (require assessment)
Version
VyOS 1.2.4
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

dmbaturin changed the task status from Duplicate to Resolved.Jun 18 2020, 11:33 PM
dmbaturin added a subscriber: dmbaturin.

Do not use vyatta-cfg-cmd-wrapper. The script-template takes care of the environment setup and exposes the set/delete/commit command for you to run as if it was an interactive session.

#!/bin/vbash
source /opt/vyatta/etc/functions/script-template

delete firewall group address-group FW_OUT
commit