Page MenuHomePhabricator

Slow boot/load and CLI response times
Open, Requires assessmentPublicBUG

Description

During our testing of VyOS 1.2.1-S2 I noticed a significant increase in boot/load and CLI response times on one of our customer edge device platforms.

Relevant Device Specs:

Model: Lanner NCA-1020 (https://www.lanner-america.com/buy-lanner/product/network-appliances/x86-desktop-network-appliances/nca-1020a/)
Processor: Intel Celeron N3010, 2.4GHz, Dual-Core
Memory: 2GB DDR3L 1600 MHz non-ECC
Ethernet: Intel i211, 10/100/1000 Mbps RJ-45
Storage: mini mSATA 32 GB

Observed boot times

** Boot time 1.1.8 -> 1.2.1-S2 w/o configuration ~160 seconds
** Boot time 1.1.8 -> 1.2.1-S2 /w configuration ~260 seconds

** Boot time 1.2.1-S2 w/o configuration ~120 seconds
** Boot time 1.2.1-S2 /w configuration ~233 seconds (no less than 180 seconds)

** Boot time 1.1.8 w/o configuration ~40 seconds
** Boot time 1.1.8 /w configuration ~60 seconds

** Avg CLI response time 1.2.1-S2, ~2 seconds
** Avg CLI response time 1.8.1, no perceivable delay

If it's helpful, I can also attach all permutations of the configuration files we tested with.

Details

Difficulty level
Unknown (require assessment)
Version
1.2.1-S2
Why the issue appeared?
Will be filled on close

Event Timeline

ekim created this task.Jul 2 2019, 2:53 PM
pasik added a subscriber: pasik.Jul 2 2019, 3:26 PM
UnicronNL assigned this task to zsdc.Jul 3 2019, 5:53 PM
zsdc added a comment.Jul 8 2019, 9:15 AM

Hello, @ekim!
Such a significant increase of boot time with the same configuration is very strange, but still possible - even small changes can easily cause such behavior if you have a huge or some specific configuration. But what is more strange - CLI response time. Regardless of configuration, CLI should work without visible delays, except for autocompleting or commit operations.
Could you check the current load of the host at that moment, when CLI is slow? We must be sure that the system is not overloaded.

ekim added a comment.Jul 8 2019, 1:02 PM

Hi @zsdc,

Neither the CPU or memory load ever exceeds 0.5% utilization while idling. I haven't tested whether throughput or control plane timings have been impacted.

The configuration is identical from 1.1.8 -> 1.2.1-s2. Doesn't make sense to me why there is such a large impact, especially with no impact to resource consumption.

Thanks

zsdc added a comment.Jul 10 2019, 2:09 PM

@ekim, could you make a screen record with this CLI delay?

In 1.2 was changed a lot of internal logic and many software which handling different services, so some impact is possible. But, we are trying to do our best to avoid such a big difference. After we will be sure that there are no problems with CLI in your case, we will switch to configuration analysis to detect possible bottlenecks.

ekim added a comment.Jul 11 2019, 9:19 PM

Neither have configurations on them. However, CLI response time when a configuration is loaded is even slower on 1.2.1-s2. Of course, no perceptible impact on 1.1.8.

zsdc added a comment.Jul 13 2019, 8:53 AM

Hello, @ekim! I see now. This is more looks like the waiting due to I/O. Do, please, the next:

  1. Run sudo atop -w /tmp/atop-mon.log -a 5 60 in dedicated terminal.
  2. Try to work several minutes in the terminal. It must freezing at this moment, otherwise, collected test data will be wrong.
  3. Wait until atop finish its work (~ 5 min from the start).
  4. Copy the file /tmp/atop-mon.log from the router and send to us for analysis.
ekim added a comment.Jul 15 2019, 4:08 PM

Thanks! Here you go!

zsdc added a comment.Jul 18 2019, 2:56 PM

Thank you, @ekim!
What exactly you were doing at the moment of this log record? I see a lot of scripts, which are almost permanently do something in the system. Is it possible that this system contains some custom scripts (in /config/scripts folder, for example)?
If yes, you should check the schedule of execution, and requirements of modification for 1.2 version command syntax.

ekim added a comment.Jul 18 2019, 9:29 PM

These are default VyOS installations without any modifications other than to the running configuration. During these recordings I set IP address, duplex, and speed on eth0, set a static route, hostname, and ntp, time, and dhcp server configurations. While doing so, I used ? to enumerate possible configuration paths and [tab] for CLI completion to hopefully illuminate the latency in these operations.

The only thing in /config/scripts is the default file and has only default comments in it:

vyos@vyos:~$ ls /config/scripts/
vyatta-postconfig-bootup.script
vyos@vyos:~$ less /config/scripts/vyatta-postconfig-bootup.script 
#!/bin/sh
# This script is called from /etc/rc.local on boot after the Vyatta
# configuration is fully applied. Any modifications done to work around
# unfixed bugs and implement enhancements which are not complete in the Vyatta
# system can be placed here.
zsdc added a comment.Thu, Aug 1, 9:38 AM

@ekim, we have never met with such a problem and cannot reproduce it in our environments. The better way to continue investigation would be getting access to this installation. If this would be possible, we could continue debugging.