Page MenuHomeVyOS Platform

At boot, effective config should not be equal to current config
Closed, ResolvedPublic

Description

The effective_ functions of config.py should return a empty config on boot, not the current config. This is so that scripts can delta the two configs and apply only the changed configuration.

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Behavior change

Event Timeline

jjakob triaged this task as Normal priority.Apr 30 2020, 7:56 PM
jjakob created this task.
jjakob created this object in space S1 VyOS Public.

Definition of effective is here:

https://phabricator.vyos.net/w/development/cli-shell-api/

Whether inside or outside of a config session, at boot effective == active (current).

Right. In my opinion it would be necessary to change that.

For example, look at this commit:
https://phabricator.vyos.net/rVYOSONEXb6aa1e7f3fb40cea11813dad53b50aeeb28fc74e

If effective on boot reflected the current state of the system (no config committed yet), it would be empty, not equal to active.
If it's equal to active, it should mean that all the config has already been applied.

Thus we could easily get a dict of the effective and dict of the active config, compare them, and see what still needs to be applied.
Right now we can't as this doesn't work on boot, even though it works during a config session.

jestabro added a comment.EditedApr 30 2020, 11:40 PM

On boot, the existing config file is committed.

pasik added a subscriber: pasik.May 1 2020, 10:35 AM

@jestabro so get_config_dict(effective=True) will return an empty config dict on boot? AFAIK by reading config.py, it will be the same as get_config_dict(effective=False), so the diff between them will return 0 differences, thus nothing to apply, even though there's everything to apply (as it's a first commit on boot).

Correct me if I'm wrong, I haven't tried it in practice, but the commits that fixed T2030 would seem to support my understanding.

I suggest you ignore T2030 for a moment --- I recall discussing that issue with the author, but I think the code comments are leading you astray ... we can return to that later.

You appear to equate active with the saved config file, in your first comment above; that will be the case, after the boot commit completes, which begs the question: what do mean 'at boot' --- at some point during the boot process (in vyos-router), a config session is initialized, the config file (or a default if first boot after install) is loaded, and committed. Between load and commit during boot, there is no active of effective config defined, nor can you query it without error, as it is meaningless; on the other hand, between load and commit, the proposed config (showConfig --show-working-only) is what you would expect; after commit, one has active/effective == proposed, assuming no error.

My impression is that you are concerned by the comments in T2030, and warned off of a use case that you have in mind. Perhaps if you define the use case, it would help to clarify some of the discussion.

If you want a config_dict of the saved config file for comparison, irrespective of the state of active/effective/working, that should be easy to do.

jjakob added a comment.May 8 2020, 7:45 AM

I suggest you ignore T2030 for a moment --- I recall discussing that issue with the author, but I think the code comments are leading you astray ... we can return to that later.
You appear to equate active with the saved config file, in your first comment above;

No, I am not equating the active config with the saved config file.

that will be the case, after the boot commit completes, which begs the question: what do mean 'at boot' --- at some point during the boot process (in vyos-router), a config session is initialized, the config file (or a default if first boot after install) is loaded, and committed. Between load and commit during boot, there is no active of effective config defined, nor can you query it without error, as it is meaningless; on the other hand, between load and commit, the proposed config (showConfig --show-working-only) is what you would expect; after commit, one has active/effective == proposed, assuming no error.

This is exactly what I had in mind, but maybe my terminology was inaccurate: between load and the end of the commit, during the commit, we have the proposed config, which can be queried with get_value() etc., and the effective config, which can be queried with get_effective_value() etc.
Currently the interface scripts take the proposed config and apply it, but there are some places where they already compare it with effective config, to see what the differences are and what to apply.
For example:
https://github.com/vyos/vyos-1x/blob/current/python/vyos/configdict.py#L299-L315
This takes the difference between get_values and get_effective values to see what addresses to remove.

If we look at config.py:
https://github.com/vyos/vyos-1x/blob/current/python/vyos/config.py#L99-L112
effective config: if /tmp/vyos-config-status exists, it gets it via cli-shell-api --show-active-only, otherwise it reads config.boot.
This is the behavior I mean - IMO this should not read config.boot, but return an empty config, as if there is no active config, it shouldn't return the whole config.boot.
Otherwise the running (active) and session (working) configs will be identical during first commit during boot, so any diff between them will be empty, so the scripts will think that the config they're being asked to apply is already applied, and will behave incorrectly.

The working config is either read via --show-working-only if in session, otherwise it's set to be equal to the effective/active config. This doesn't seem like an issue to me - during boot the config session is initialized and then committed, so this only applies for op-mode commands.

But I also see this comment:
https://github.com/vyos/vyos-1x/blob/current/python/vyos/config.py#L254-L259
Of course, what we are already doing in some parts of the code, is effectively generating this diff by comparing working and effective configs. Thinking about it, it would maybe be faster to just query cli-shell-api for the diff and parse it into an internal representation that the scripts can use.

My impression is that you are concerned by the comments in T2030, and warned off of a use case that you have in mind. Perhaps if you define the use case, it would help to clarify some of the discussion.
If you want a config_dict of the saved config file for comparison, irrespective of the state of active/effective/working, that should be easy to do.

The use case is this: I want to get a dictionary of the config that I need to apply in a script. Whether that is by comparing working/session and current/active configs or by implementing a new method to get it from cli-shell-api directly. On boot, this comparison should return the whole unapplied config that the script is being asked to commit. During a config session, this should return just the unapplied changes that the user made in the current config session. This way we can granularly apply configs to interfaces and services and not restart or delete them if we don't need to. If only the comment of a node changed, I see no current way to easily detect that and not apply the whole config anyway. This means we often need to delete and recreate an interface for just comment changes.

T2030 was AFAIK a prime example of this: the initial implementation of the code in question assumed that get_effective (current/active config) returned an empty config on boot, as it should as there is no active config yet, but instead it returned config.boot, which meant the comparison returned zero differences, so the script thought there were no changes to apply. To fix the bug, this was then changed to read the active config from the interface (/sys) directly, but obviously we can't do that for everything!

I'm unfamiliar with how cli-shell-api operates and what we can get from it in different states, but I assume it has more functionality that would be useful for scripts that isn't exposed through the python bindings right now. Maybe it's best to expose more of its functionality to the end scripts and let us decide what parameters we want to read from it, than try to hide or abstract its behavior.

@thomas-mangin I saw that and I think it's a good idea, to even make it into a common library we could use from all interface scripts. Does it work correctly on boot however? I see it uses exactly the same config.py methods that I'm talking about above that would return incorrect values on boot. I didn't try it but I'm highly suspecting it doesn't.

@jjakob it does nothing about the boot case but it would be much easier to add it to that code than what we do ATM.

jestabro claimed this task.May 8 2020, 2:54 PM

Okay @jjakob, thanks for pinpointing the concern: (quoting from above) 'IMO this should not read config.boot, but return an empty config' --- this is a reasonable position; I will test this modification, along with a consistent change to get_config_dict, and we can discuss.

Suggested patch here:
https://github.com/vyos/vyos-1x/compare/current...jestabro:T2049.patch

The concern would be that somewhere, there is code either explicitly or implicitly relying on the old behaviour; testing is needed before merging.

jestabro changed the task status from Open to Needs testing.May 10 2020, 3:50 AM
jestabro closed this task as Resolved.May 17 2020, 7:06 PM
jestabro moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus board.

@jestabro I think T2501 is related to this - can you take a look?

@jjakob yes, thanks for pointing this out ...

@jjakob, yes I see the issue there; but firstly, did you try debugging with boot parameter 'vyos-config-debug' ? cf.

https://docs.vyos.io/en/latest/contributing/development.html#debugging

I agree that is not a resolution to the fact that load presumes an active config, but I added the above for precisely this situation.

@jestabro thanks for that tip, vyos-config-debug helped me track down the failing code. At least I can continue working until T2501 is fixed.

Cool, thanks for pointing this out; I will take T2501, and will probably rename for clarity.