Page MenuHomeVyOS Platform

Increased boot time from 1.2.4 -> 1.3 rolling by 100%
Closed, ResolvedPublicBUG

Assigned To
Authored By
c-po
Mar 1 2020, 2:48 PM
Referenced Files
F538228: br1.svg
May 20 2020, 10:16 AM
F538239: image.png
May 20 2020, 10:16 AM
F537693: bootchart-20200520-0404.svg
May 20 2020, 6:13 AM
F478492: config.boot
Mar 1 2020, 2:48 PM

Description

The following configuration takes 430 seconds to boot on my APU4C4 (quad core 1GHz, 4GB ECC RAM) board using VyOS 1.2.4.
Boot time increases to 1000 seconds running VyOS 1.3 rolling.

I hope this configuration helps performance testing.

Details

Difficulty level
Hard (possibly days)
Version
1.3-rolling-20200229
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

One of the offeners is

vyos@vyos# show system config-management commit-archive
 location tftp://172.16.20.200/CONFIGS/BR1.wueIII

removing this line will reduce the boot time to 700 seocnds (down by 30%) .. weird

This comment was removed by c-po.

0-18 kernel boot
18-41 system starting inc FRR
70-120 the python most of the time is spend/wasted in cli-shell-api - so I would think reading the configuration file. If we can optimise / reduce this number of calls it would be very good.
140-220 is more or less firewall setup with vyatta-firewall / vyatta-upset.pl / ip6tables

rounding numbes:

So 22% in cli-shell-api
So 36% in firewall

Please see attached the full log of the boot process. I try to supply a PNG later on but I guess it will be really huge.

image.png (151×578 px, 13 KB)

Every call to /bin/cli-shell-api --show-working-only --show-show-defaults --show-ignore-edit showConfig takes multiples seconds (6?)
/usr/libexec/vyos/conf_mode/system-timezone.py call it twice.
/usr/libexec/vyos/conf_mode/nat.py call it twice
/usr/libexec/vyos/conf_mode/interfaces-loopback.py call it twice ... etc.

Looking at the ... seconds it takes to set the hostname ...
The code for /usr/libexec/vyos/conf_mode/host_name.py is spent waiting for vyos-hostd to come back

sudo apt install python3-pip
sudo pip3 install -U pip setuptools wheel virtualenv
pip3 install git+https://github.com/evanhempel/python-flamegraph.git
python3 -m flamegraph -o perf.log /usr/libexec/vyos/conf_mode/host_name.py
./flamegraph.pl perf.log > unix-time.svg

Why is it sometimes called twice?

Possibly it's in cases where it's first called in get_config(), then also later in verify() or apply() - the object is function local scope and isn't saved in a global variable. It would be a simple fix if Config() were initialised in main() and passed as an argument to any function that needs it.

I've created some proof of concept code that i think could help on this issue. https://github.com/runborg/vyos-1x/blob/main-cfg/src/conf_mode/main.py this is a conf-mode executor that handles multiple conf mode scripts. The reason i think this could seriously help on this issue is that as this is all running inside a single python tnterpreter, its able to load the config object once and pass it to all needed conf_mode scripts without a need for reinitialization.

The only big bottleneck with this script is that it only executes as a single vyatta-cfg priority so all executed scripts needs to be in a continous order without old bash dependencies.

As the current "priority map" there aren't a loot of concurrent python blocks, but i think many of the remaining bash/perl scripts could be moved to new places. https://pastebin.com/z6ZvkJKB

I used @runar code on my branch working on T2522 .. Configuring 100 dummy interface went from 42 to 8 seconds.

erkin set Issue type to Bug (incorrect behavior).Aug 31 2021, 5:27 PM
erkin claimed this task.
erkin added a subscriber: erkin.

I did a quick test with 1.3.0-epa3 and this config loads in about 6 minutes in my VM (2 GB memory, 1 core at 1,6 GHz) at boot, some WireGuard config errors notwithstanding. After all the performance improvements, I think we can consider this resolved at this point.

Sorry, but I dont think this is fixed; I just attempted to upgrade my working 1.2.8 configuration to 1.3.

The 1.2.8 configuration will (eventually, after a very long time) fail with an out-of-memory condition when attempting to load on 1.3.

[  812.814956] Out of memory: Killed process 1348 (python3) total-vm:3779656kB, anon-rss:3425936kB, file-rss:1808kB, shmem-rss:0kB, UID:0 pgtables:6784kB oom_score_adj:0
[  813.965717] vyos-router[813]: Starting VyOS router: migrate rl-system firewall configure/usr/libexec/vyos/init/vyos-router: line 104:  1348 Killed                  sg ${GROUP} -c "$vyos_libexec_dir/vyos-boot-config-loader.py $BOOTFILE"
[  814.089133] vyos-router[813]:  failed!

This is under KVM, so I just doubled the memory (from 4G to 8G) and tried. It takes longer, but eventually bombs. Watching in virt-top shows memory usage slowly increasing:

667221 libvirt+  20   0   10.6g   8.1g  28044 S 137.7   1.6  48:39.51 qemu-sy+

On 1.2.8, the exact same configuration loaded in just about 5 minutes. I'm still waiting for the login prompt on 1.3 now, after ~20 minutes. Have to fall back to 1.2.8, unfortunately, perhaps I might try a 1.4 rolling when I get back from holiday.