Page MenuHomePhabricator

VyOS Can Lose Parts Of Its Config On Reboot - In Certain Situations
Open, HighPublicBUG

Description

Create a very basic firewall config like this http://pastebin.com/biT3iNes then delete (or rename) the TrustedHosts address group ( The CLI will give you an error like: "Error: group [TrustedHosts] still in use" but it will remove the group anyway ), commit, save, reboot.

After it reboots the entire EXTERNAL-TO-SELF firewall will not exist because a single rule failed to evaluate. This problem is amplified when you consider other parts of the config tree, you will loose zones because the single firewall no longer exists. Same with ESP/IKE groups, you will loose peers definitions.

This partiuclar issue can be fixed by making the "still in use" error fatal, maybe their is a generic way to solve this for all portions of the config tree

Details

Difficulty level
Hard (possibly days)
Version
1.1.7
Why the issue appeared?
Will be filled on close
syncer assigned this task to dmbaturin.Jul 25 2016, 4:15 PM
syncer triaged this task as High priority.
syncer edited projects, added VyOS 1.1.x (1.1.8); removed VyOS 1.1.x.
syncer added subscribers: VyOS 1.1.x, VyOS 1.1.x (1.1.8).
syncer removed a subscriber: VyOS 1.1.x (1.1.8).
syncer edited subscribers, added: Maintainers; removed: VyOS 1.1.x.Aug 21 2017, 2:05 AM
syncer added a subscriber: syncer.

@jhendryUK is this also affects 1.2.x ?

syncer edited projects, added VyOS 1.2 Crux; removed VyOS 1.1.x.Oct 11 2017, 9:40 PM

Assuming that 1.2 affected in same way,
moving it to 1.2 and suggest work on fix there

sebastianm added a subscriber: sebastianm.EditedOct 12 2017, 9:38 AM

This also happens with the DHCP server configuration if the DHCP subnet is different than the one used on the LAN interface (when it's configured with VRRP by following the VRRP tutorial on the VyOS wiki).

From the logs, it looks like the DHCP server doesn't listen on the LAN interface because the DHCP subnet (example: 192.168.1.0/24) doesn't match the LAN interface subnet (10.0.0.1/24), although the VRRP subnet is set to 192.168.1.0/24.

Same for 1.1.7. @syncer

pasik added a subscriber: pasik.May 15 2018, 9:56 PM
syncer changed the subtype of this task from "Task" to "Bug".Oct 18 2018, 5:40 AM
Raeven added a subscriber: Raeven.Nov 4 2018, 10:59 PM
kroy added a subscriber: kroy.EditedNov 7 2018, 10:39 PM

It's worth mentioning (which may or may not be related), but it seems like every other upgrade randomly takes out my SNMP config. But it doesn't happen on every install. Redundant routers installs (identical in every way except for IPs and and priorities), will have router1 lose its config, but router2 and router3 will be fine.

And it's only SNMP. I find out after the upgrade when the alerting system says it can't contact the server via SNMP.

community public {
    authorization ro
}
olofl added a subscriber: olofl.Nov 13 2018, 11:03 AM

there's a problem when naming firewall network groups and port groups to the same name, and then later deleting one of them. Maybe thats related to this one.

vyos@vyos# set firewall group network-group TEST network 1.2.3.4/32
[edit]

vyos@vyos# set firewall group port-group TEST port 33
[edit]
vyos@vyos# commit
[ firewall group network-group TEST ]
Error: type mismatch [port] [network]
vyos@vyos# set firewall group port-group TEST port http
[edit]
vyos@vyos# compare
[edit firewall group port-group TEST]
+port http
[edit]
vyos@vyos# commit
[ firewall group port-group TEST ]
Error: undefined group type
[edit]
vyos@vyos# save
Saving configuration to '/config/config.boot'...
Done
[edit]
vyos@vyos# exit
exit
vyos@vyos:~$ show configuration commands
set firewall group network-group TEST network '1.2.3.4/32'
set firewall group port-group TEST port '33'
set firewall group port-group TEST port 'http'

vyos@vyos# delete firewall group port-group TEST
[edit]
vyos@vyos# commit
[ firewall group port-group TEST ]
Error: group [TEST] doesn't exists

[edit]
vyos@vyos# save
Saving configuration to '/config/config.boot'...
Done
[edit]
vyos@vyos# exit
exit
vyos@vyos:~$ show configuration commands
set firewall group network-group TEST network '1.2.3.4/32'

bswinnerton renamed this task from VyOS Can Loose Parts Of Its Config On Reboot - In Certain Situations to VyOS Can Lose Parts Of Its Config On Reboot - In Certain Situations.Nov 29 2018, 12:04 AM
bswinnerton set Why the issue appeared? to Will be filled on close.

@bswinnerton I think the "in certain situations" should be defined "when the config on disk is invalid for whatever reason"

In this case, there is a bug that is allowing the saving of a broken config.

But the disappearing config is another issue completely. For example, Wireguard failing to load due to a mis-packaged kernel module from T1049, caused anything touching the wg0 device to disappear. Similarly, from something I reported a while back on the forum and mentioned above, a problem with SNMP caused my simple SNMP config to disappear.

As I was using the rolling releases for a while, things like this would pop up somewhat regularly there. I ended up writing a script comparing the running config to the version in my local git repos for my dozens of instances. I didn't consider what was actually occurring until the Wireguard bug I mentioned.

Maybe there's sane way to verify that the loaded config is the same as the on-disk config. Or something to report to MOTD or something when the on-disk config can't be loaded fully. I haven't really dug into that part of the code though.

Sorry about that @kroy. That was the default value when I fixed a typo in the title 😬. I didn't mean to update that value.

kroy added a comment.Nov 29 2018, 1:55 AM

@bswinnerton Ha, no problem. I just figured with a bit more insight maybe it would be worth breaking this out into two tasks

Also happens when you put in invalid BGP config that doesn't get caught by the validation. It then thinks its applied successfully, saves it as the boot config, then BGP is broken upon next boot up.