Page MenuHomePhabricator

Large firewall rulesets cause the system to lose configuration and crash at startup
Needs testing, HighPublicBUG

Description

Hi all,

We have find an important bug in VyOS: if quite a large ruleset is configured in the system, it will work OK and even add new rules while it's powered on, but if the device is rebooted it will crash at startup.

I will detail our investigation and the tests we have done:

  • We have a config.boot of more than 25K lines with several rulesets and static routes that was working OK and with no error at commits or saves, but when we rebooted the device the console returned an error configuration, the interfaces were messed up and we were not able to connect to the device remotely.
  • After several tests, we have narrowed down the issue to a ruleset called LAN-INBOUND, with a length of almost 12K lines at config.boot. The funny thing here is that there are neither errors nor unsupported config in this ruleset, but some kind of a "ruleset length limit" (we guess): we have copied config.boot to two different files: one with the first half of the ruleset and the second one with the second half of the ruleset.
  • When we try to boot the system with any of these configuration files with the half of the ruleset, the system boots up successfullt, but when we try to boot the device with the full ruleset (both halves included) it crashes, does not apply the configuration and even mess the interfaces up appearing a new one called "renameX" (rename4 in the attached image "iface_rename.png"):

In order for you to try to replicate the bug, it can be easily done with the following three files that have been uploaded:

  • "config-base_lan.cfg": config with the full ruleset
  • "config-base_lan_half1.cfg": config with first half of the ruleset
  • "config-base_lan_half2.cfg": config with second half of the ruleset

Steps to reproduce:

  1. Copy all three files to /config/
  2. Copy the first half ruleset to /config.boot and reboot (it should work):
sudo cp /config/config-base_lan_half1.cfg /config.boot
reboot
  1. Repeat the step 2 with the second half ruleset (and it should also work)
  2. Repeat the step 2 with "config-base_lan.cfg" (and it should crash with "configuration error" at startup).

Maybe you can find useful that if you boot the system with an empty or default configuration and then just load the config file containing the full ruleset it works!! But it doesn't if you try to load that config file on boot.

In addition, for testing purposes we haven't assigned the ruleset to any interface... and even then the system messes the interfaces up at boot!!

Additional note: this bug also happens with VyOS 1.1.8, but with a difference: while VyOS 1.2.0 crashes but leaves config.boot untouched (with the original configuration), VyOS 1.1.8 crashes as well but also modifies config.boot leaving it inconsistent.

Please let me know if you need further information or any other thing I can help with.
Thank you very much and best regards!

Details

Commits
Restricted Diffusion Commit
Difficulty level
Unknown (require assessment)
Version
1.2.0
Why the issue appeared?
Will be filled on close

Event Timeline

csalcedo created this task.May 21 2019, 3:18 PM
syncer triaged this task as High priority.
csalcedo updated the task description. (Show Details)May 22 2019, 2:01 PM

I met this issue before. But I can save firewall group to a txt then use ipset to load the firewall groups when system boot. It works well. Vyos boot quickly and firewall policy updated well.

pasik added a subscriber: pasik.May 23 2019, 6:51 AM
aibanez added a comment.EditedMay 29 2019, 12:07 PM

Hi guys,

Could this be related to some timeout of the scripts loading the firewall config into iptables at boot time? Longer rulesets would mean longer loading times, until this hypothetical timeout is triggered and the issue manifests?

What it really looks strange to us is that this issue only seems to happen at boot time. Loading and commiting these long rulesets once the VyOS has booted seem to work seamlessly.

Thanks!

I agree with @aibanez
In addition, sometimes it works (but only a very few) while almost always it crashes at startup and interfaces are messed.
Do you guys have any clue/suggestion or any other further test that can be done?

I was looking on this issue. Please find below a description of what I think is the root cause for this issue:

if ( -d $VYATTACFG ) {
$newname = hotplug($ifname, $hwaddr);
} else {
  $newname = coldplug($ifname, $hwaddr);
}
  • hotplug() and coldplug() are 2 functions that are conditional called by our script as described below:
  • hotplug() should be used when on a vyos device that is up and running and a new interface is added
  • coldplug() should be used when vyos devices are rebooted – to populate the network interfaces with the correct names
  • as you can see in the code the decision is taken based on the existence of a $VYATTACFG directory
  • This directory is where the active config for a running VyOs is kept (/opt/vyatta/config/active).

The above logic is ok as long as the VyOS boot time is not taking too long and the $VYATTACFG directory is not created before the udev linux subsystem will load the rule-set for interfaces.
On the other hand, on our case, because we have a very long config file – loading it is very time and resource consuming. Because of this we end up in a situation where $VYATTACFG directory is created before the udev subsystem is executed for all interfaces. And because of this the above "if statement" will use the path with hotplug() function for some of the interfaces and for these interfaces the rename process is not completed successfully.

The solution proposed by me in pull request https://github.com/vyos/vyatta-cfg-system/pull/102 is looking for the existence of config directory for active interfaces "/opt/vyatta/config/active/interfaces" instead of looking only for "/opt/vyatta/config/active" directory.

runar added a commit: Restricted Diffusion Commit.Jun 4 2019, 8:34 AM
zsdc changed the task status from Open to Needs testing.Jun 24 2019, 5:50 PM
zsdc added a subscriber: zsdc.

Provided configuration from the first message was successfully loaded in 1.2.0-rolling+201906240337.
@csalcedo, could you test new rolling to check if the problem is solved for you too?

Hi @zsdc,

I have upgraded a couple of clusters to 1.2.0-rolling+201906240337 and systems started successfully, and I also applied the hotfix indicated in thel pull request (https://github.com/vyos/vyatta-cfg-system/pull/102/files) to several productive clusters by just adding "/interfaces" to $VYATTACFG variable with the same successful results, so I can confirm that the fix provided by @mtudosoiu works great.

So many thanks to him for the fix, great job!

Finally, is this fix expected to be merged in the next stable release (1.2.2)?