Page MenuHomeVyOS Platform

Cannot recover from failed boot config load
Closed, ResolvedPublicBUG

Description

If boot commit fails, the standard procedure of debugging (configure; load; commit) doesn't work.

vyos@vyos# load
Traceback (most recent call last):
  File "/usr/libexec/vyos/vyos-load-config.py", line 62, in <module>
    config = LoadConfig()
  File "/usr/lib/python3/dist-packages/vyos/config.py", line 103, in __init__
    running_config_text = self._run([self._cli_shell_api, '--show-active-only', '--show-show-defaults', '--show-ignore-edit', 'showConfig'])
  File "/usr/lib/python3/dist-packages/vyos/config.py", line 148, in _run
    raise VyOSError()
vyos.config.VyOSError
[edit]
vyos@vyos# run show version 
Version:          VyOS 1.3-rolling-202005221529
Release Train:    equuleus

Built by:         autobuild@vyos.net
Built on:         Fri 22 May 2020 15:29 UTC
Build UUID:       5935041c-a662-413d-a4be-1924a7a70bd6
Build Commit ID:  a29347ca9dd260

Details

Difficulty level
Unknown (require assessment)
Version
1.3-rolling-202005221529
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

jjakob triaged this task as Unbreak Now! priority.May 23 2020, 2:43 PM
jjakob created this task.
jjakob created this object in space S1 VyOS Public.

The called code can return 3 - in that case in that case _run should return an empty string

jestabro lowered the priority of this task from Unbreak Now! to Normal.Jun 6 2020, 3:15 PM

The now standard method of debugging was clarified in T2409, with reference to this ticket (vyos-config-debug); the question here is whether there is any way to use 'configure; load; commit' on failed boot config load --- in the worst cases, no, as the CLI has not succssfully initialized, but will investigate. Changing to bug for evaluation.

jestabro changed the subtype of this task from "Task" to "Bug".Jun 6 2020, 3:17 PM

I think 'configure; load; commit' is important to make debugging easier and faster. There are issues with the vyos-config-debug method: it needs a full reboot to test every change, which can take minutes (and one may not fix the bug in the first 10 tries even, depending on how sleep deprived one is) and it lacks a easy way to see the scripts stdout/stderr (it is discarded unless we enable airbag's debug log, which is yet another thing to have in mind) as the standard traceback that's logged to /tmp may not be enough to catch the exact error and we need to print some variables to look at them or something like that. But mainly the issue is rebooting is much slower than just doing load/commit.
Can't we just read config.boot into the session config of configtree or config? Wasn't that exactly what was done before? I'm 100% there was a function that read the config.boot file in config.py in case the config system wasn't initialised.

I agree with the downside of vyos-config-debug; it was provided to allow some analysis when other was not available. What was done before was a straight call to the vyatta backend; what is done now is a essentially what you're suggesting, but in a case where the context of config/configtree is not successfully initialized, hence the failure. This is a good point for analysis, however, and I will restore the behaviour, likely by (a) using a straight call to the backend, and meanwhile (b) investigating if we can have partial context for config/configtree in this case.

jestabro changed the task status from Open to In progress.Jun 6 2020, 10:10 PM

@jjakob. to clarify two points (for my sake as well), there are cases where config fails to the point where the config session can not be initialized, such that one can not enter a config session: here's an example --- say. during development, someone forgets to import a module in an early conf_mode script; config initialization is completely screwed --- this had occurred ages ago, and the only way I found it was that I had happened to be looking at their code a few moments previously and noticed ... that's what vyos-config-debug is for, and why it is hidden on a boot flag. You are talking about a much more reasonable case, where (I imagine) a specific configuration causes a partial failure, but one can still enter a config session. The checks in subtask T2568 may help restore the ability to debug by config/load/commit. Those checks are needed anyway.