Page MenuHomeVyOS Platform

Add helper functions to query changes between session and effective configs
Closed, ResolvedPublicFEATURE REQUEST

Description

In addition to a general function get_config_dict_diff, the following (at least) will be provided:
is_added(..., path)
is_deleted(..., path)
is_value_changed(..., path)
get_value_changed(..., path)

The naming conventions, return values, etc., are subject to critique/revision.

Details

Difficulty level
Hard (possibly days)
Version
vyos-1.3
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Internal change (not visible to end users)

Event Timeline

jestabro changed the task status from Open to In progress.Jul 6 2020, 2:49 PM
jestabro triaged this task as Normal priority.
jestabro created this task.
jestabro created this object in space S1 VyOS Public.

What about providing a is_changed, that returns False, added, deleted or changed with the new value provided in the result? Added/deleted/changed can be of a enum type or something like that

Also, as everything set in python will render True, couldn't is_value_changed return the old and new value instead of just true/false? This will make get_value_changed redundant

I am entirely open to suggestions here; the underlying functions support any such forms. Note however, that we want to distinguish between new/deleted paths and changed values --- one could treat these all as a difference in path, but it will be more convenient for use if we make the distinction ...

Regarding is_value_changed, I was thinking the other way around: get_value_changed returns None if no change, so is_value_changed would be redundant --- put good point: one may/will want both old and new values

Good point, get_value_changed is a better name for this. As you want to distinguish between a returned value of False and a "Not Changed" using a two tuple (namedTuple?) returned with new and old value makes it easy to "see" the difference

Yes, I definitely prefer a return type of tuple ...

About is_changed, i see the need to have a function that tells if there are any changes in the path tree under the given path.. specified.

Yes, I'm expanding all paths under the specified path

@runar in fact, that's all one wants in current use case: has the list of elements, directly _under_ the specified node, changed? For example, (1) change of values (2) added or removed tag node entries.

So, as far as useful helper functions, one certainly wants:
get_child_nodes_changed(... path)
get_value_changed(... path)

Regarding the API proposed, most of these functions are also syntactic sugar for the same operations. Looking at the use cases, there is two: getting information about a leaf value, or getting information about tagNode changes.

For leaf values, we seek to know if something was added, removed, modified or deleted, therefore both the effective and running configuration must be compared.
I would propose a function which will return an "enum" and a value. The enum should have 5 states (missing, added, modified, deleted, unchanged) and the value be None if missing, the new value if unchanged, added or modified or the effective value if deleted.

For tagNode, there is a need to know which nodes were added or removed or unchanged. I would suggest function returning a dict with the name as key and the state as value (added, modified, unchanged)

As both these function are in effect working on the configuration data, the logical place would be the add them to the Config class API.

Something like:

class Config(object):
    def get_leaf_changes(self, path):
    def get_tag_changes(self, path):

Any wrapping API (as the one presented above or in this comment) would lead to multiple calls to get_config_dict, therefore the method will first need to be optimised, before/when this is implemented.

get_config_dict is currently un-efficient as it performs some JSON manipulation. It is not an issue currently, as the configuration modules are calling it only once each and then use the returned result.
Also as python3 is starting for cold for each module, even if the value was cached within the Configuration class, it would make no difference (at the moment).
However the proposed change would lead to multiple function calls, each requiring get_config_dict, so this optimisation is surely worth it, so it would surely make sense to cache the whole dict from the root path and change the code to provide the searched data from the cached dict.

It is true that get_config_dict is slow, so it should only be called once (twice for diff) per session. Consequently, the work flow will be:

  1. If one has no need for a diff:
c = Config()
c.get_config_dict()
...
  1. If one needs to compare session and effective:
c = Config()
r = get_config_diff(c)

where r is initialized by calling get_config_dict( ..., effective=True/False), and provides methods:

get_child_nodes_changed(path, ...)
get_value_changed(path, ...)
get_node_changed(path, ...)

returning a tuple (added, deleted)
and other obvious methods. For get_*node*_changed, the keyword arg 'return_as_dict=True' will obtain the full dict under the node, instead of a list, for further access. (get_sub_dict is relatively fast).

ConfigDiff mirrors '_level' and '_make_path' to maintain the path relative workflow.

The above is ready to be moved to testing shortly. Note moreover that the full session and effective dicts can be returned; any common use case for this can then be added as additional methods.

One forgotten point: get_config_diff obtains the config_dicts at root level, so any movement within/between sub-sections are available, with set_level.

@thomas-mangin and @runar I do like the enum idea, however, this would add boilerplate to the conf_mode scripts, which would quickly become annoying in practice ... rather, common workflow just wants the actionable data (added, deleted) _and_ the ability to then access values under the node in question. Consequently, I'm following @runar 's suggestion of returning a tuple, combined with the return_as_dict arg to allow access to sub-data.

@jestabro I would like to hear why you advocate this API and why you believe it is better than the one I have suggested.

For what I called get_tag_changes , I do not have a strong feeling about the returning two lists versus a dict. What is best depending on the code using the function.
But why have more than two functions? If KISS is the design principle then it is better to have less and data which is expressive (using enum).

"I do like the enum idea, however, this would add boilerplate to the conf_mode scripts, which would quickly become annoying in practice"
Could you please explain what you mean ... what will become annoying?
Having the enum makes the data format explicit and easier to understand for the caller. Also when used instead of string it allows linters to detect typos.

"It is true that get_config_dict is slow, so it should only be called once"
sorted: https://github.com/vyos/vyos-1x/pull/491

Also if a function is passed a single parameter, a class, such as r = get_config_diff(c) then the function should be a method of the class r = c.get_diff()
I prefer get_diff as this is the Config class, get_dict_config could/should have been more simply named get_dict ..
And I have the felling that you will like this video: https://www.youtube.com/watch?v=o9pEzgHorH0

"will obtain the full dict under the node, instead of a list"
so the format of the data returned will change depending on the parameters! I would not call that a good API.

"ConfigDiff mirrors '_level' and '_make_path' to maintain the path relative workflow"
which goes toward the argument that really this code should be in Config ...
or should be a sub-class of Config, if like for Interface we want to keep the number of method in a class to a smaller number to make it easier to read.

As I click "sail", I just realise that as there is only one config per router, ConfigDict could be a subclass of Config which could be itself singleton. So even if it is subclassed, all instances could/would share the same underlying data and could be used inter-exchangeably for the content of the config.

@thomas-mangin Firstly, I needed to write this version before reasonably debating the pros/cons of various approaches --- the important idea developed is that the use of get_sub_dict, and ability to return the sub_dict under the diff'ed nodes will allow a fluid use in writing conf_mode scripts --- beyond that, I am not particularly attached to any details of implementation, yet.

Nor is my initial aversion to your taxonomy quite as strong --- my concern was that there would be regular sub-indexing to obtain the desired objects, but of course the tuple has to be indexed or a return value ignored, as well, with @runar 's idea ... we'll come back to this ...

Agreed that variable return values is improper, but the ability to return the dict val will prove useful, as mentioned; a proper form may be different functions or a composable transform.

Regarding sub-classing, that will need some discussion with respect to the envisioned generalization of Config for testing/off-loading use, but there may well be a proper generalization in that direction.

Okay, I like that video already ...

@thomas-mangin I'm convinced: for get_child_nodes and get_node, we'll return a dict of respective dicts:

def get_child_nodes(self, path=[], no_defaults=False):
        """
        Args:
            path (str|list): config path
            no_detaults=False: do not merge default values to ret['merge']

        Returns: dict of dicts, representing differences between session
                                and effective config, under path
                 dict['merge'] = session config values, merged with defaults, unless no_defaults
                 dict['delete'] = effective config values, not in session
                 dict['add'] = session config values, not in effective
                 dict['stable'] = config values in both session and effective
        """

get_value will return a tuple. Testing, and will post ...

Current version linked below, as an example for further discussion. As these ideas have been circulating for a while, it is important to have examples to debunk/revise/reject. My guess is that one likely wants much less than this, and that @dmbaturin is quite right in his recent comment "we should not rely on diffs unless there's absolutely no other way to do it --- not just about computational expense, but also readability".

Nonetheless, this example addresses two apparent issues:
(A) the occasional desire for a general abstract internal representation of VyOS config; examples of workarounds in some conf_mode scripts call Config() from verify in order to navigate other sub-sections of the config; this breaks the model. This is avoidable with already existing tools, but can appear unwieldy for complicated configs.
(B) the desire to compare session and effective config, for example, to see which interfaces have been removed. This also may be avoidable, though we need to examine emerging use cases, for which I will refer to @c-po . Note that the recent rewrite of interfaces-dummy.py uses the the empty return value of get_config_dict to infer that the interface has been deleted; simple, if implicit.

Due to (A), the configdiff class can not be a part of, nor inherit from, config.
Due to (B), one has access to the full details for merge/add/delete/stable; it is unlikely one needs anything beyond merge/delete, if at all.

https://github.com/vyos/vyos-1x/compare/current...jestabro:config_diff

jestabro changed Difficulty level from Normal (likely a few hours) to Hard (possibly days).
jestabro moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus board.
erkin set Issue type to Internal change (not visible to end users).Aug 29 2021, 1:54 PM
erkin removed a subscriber: Active contributors.