Page MenuHomeVyOS Platform

Standardized op mode script structure
Open, Requires assessmentPublicFEATURE REQUEST

Description

Right now we have a standardized structure for cond mode scripts, but op mode script structure is still arbitrary—just like in the pre-1.2 days.
Introduction of an HTTP API makes this especially problematic.

There are many "can't do it" things coming from this issue:

  • Can't get op mode command output in a machine-readable format
  • Can't run a command by its path outside of the CLI
  • Can't test individual op mode functions

One good thing about op mode is that, oddly, it's not a part of vyatta-cfg. Thus it's much easier to strangle since it can be re-done without any effects on the configuration subsystem. The only question is how exactly we are going to do it.

I've been long thinking how op mode should be incorporated in vyconf. Now I'm thinking maybe it shouldn't. It's really outside of the configuration system and appears to be an unrelated concern.

What can we do instead? We can create a separate op mode daemon in Python that will serve op mode requests. One observation is that op mode paths are a lot like HTTP application routes. For example, show interfaces ethernet eth0 detail maps is expressible in most web framework routing terms, like /show/interfaces/ethernet/<intf>/detail.

They do allow registering different routes for /show/interfaces/ethernet/ and /show/interfaces/ethernet/<intf>, which would solve our issue with messy tag/non-tag nodes.

Daemon structure

  • vyos-opd walks through op mode script dir on startup, loads every module
  • upon loading, modules register one or more op mode routes
  • vyos-opd starts serving requests for op mode paths

It likely should serve requests over a ZMQ socket for better performance. We need to discuss wether translating a CLI path like show interfaces to URL encoding for the routing library should happen on the client or daemon side of the request.

Op mode module structure

A module must define one or mode op mode functions. For example, a module for show system uptime will likely define just one show_uptime Python function.
Modules for complex components like interfaces or IPsec may provide many more functions.

They must also have an entry point function that registers routes for each function.

Standard request arguments

Verbosity arguments

  • brief — output only essential information
  • detail — output detailed information
  • "normal" — implied if neither a brief nor detail flags are set

Op mode functions may ignore the verbosity argument. For some functions, like show system uptime, different verbosity levels really make no sense.

Format arguments

  • json — output JSON
  • text — output human-readable plain text

These arguments should be composable with all other. For example json | brief combination would output JSON with some fields omitted (compared to "normal" or "detail") levels.

This is a rough draft that needs refinement of course.

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible

Event Timeline

pasik added a subscriber: pasik.Jul 21 2020, 1:13 PM

Hi,

I like this idea of building a vyos-opd daemon in Python.

My Python is quite fluent, so I think I could come up with a reasonable architecture for it to start with.

However, doing this right in a future-proof way will likely take some time, hence we should probably discuss whether this really must be in 1.3.

Any objections on the timeline?

@efficiosoft

Thank you for the offer, having read your patch it seems you are indeed fluent in Python and as there is lots to do so I am sure you will be able to help :-)
As @dmbaturin started this, I will let him come back to you about what could be done.

@dmbaturin

Please find here my thought about this idea.

I like the idea to move things to a daemon, and agree with the overall design suggested.

Perhaps Python entry points could be created for each of the commands at the root of the XML tree (and perhaps a unifying one .. "run" ?).
Doing so would allow calling this python code from the configuration mode without having to fork).

I am not sure if an extra daemon is needed. All the code could be centralised in vyos-hostd.
It already provides all the glue we need, however, the routing is currently not "per services" so it would have to be refactored that way (which would be trivial)
As any failure of the op code could be caught by a global except block, I do not think adding this feature would put the current "hosts service" at any risk if this is designed correctly.

If we do so, it would also make sense to move some the non-deamon code of vyos-hostsd from the current location into the python/vyos/ folder.
The op code could then be developed there and be therefore used without the daemon too.
It would allow the op-code python code to be used a single stand-alone, and as a daemon.

As the operational XML should soon be available as a single XML file on the router (see the patch for T2688), loading the command should be possible loading this single XML file.

The code could also allow loading extra XML commands (using the XML merging code in T2688 which was used to merge all the command definition) from a local folder, allowing users to extend the command available by default like is currently in the repository. Providing a way to run the op. code without the daemon would allow people to design their own XML add it to the router, test it without reloading "the stable" code in the daemon, and only when happy that it is working cause a restart of the daemon.

This feature could be used on when we implement the "add-on" feature down the line. A suitable naming convention should be just developed.

Thanks @thomas-mangin for sharing your thoughts on this.

I'm of course not yet that familiar with all the VyOS internals. It's a large project after all, so just my two cents on the question about splitting this out into a separate daemon.

As far as I understood @dmbaturin he suggested to release the op mode functions from the XML, letting each function register its route(s) entirely from inside Python:

upon loading, modules register one or more op mode routes

The daemon would then offer a sort of route map (probably with help texts etc.) to its clients so that they know what they can call. One such client implementation would be the interactive op mode shell, another could be the HTTP API. Did I get this right?

The advantage I see about this would be that vyopd (or however we name it) would not be locked to VyOS at all, just like VyConf. You simply hook the daemon up to a directory of self-contained modules and let it serve whatever client you like. A small Python library for building clients that talks to the server and exposes a friendly API could be bundled and distributed together with the server.

The legacy shell commands like netstat or ss could be incorporated in such a system as well, it would be trivial to provide a helper class for registering such shell commands with the route registry and let them behave like functions implemented natively in Python. Generally, by providing a supporting class-based API for writing the individual op mode functions, all the nifty details like serialization or tabulated output could be handled by the server centrally according to the actual client's requirements.

Providing a way to run the op. code without the daemon would allow people to design their own XML add it to the router, test it without reloading "the stable" code in the daemon, and only when happy that it is working cause a restart of the daemon.

Since the server operates via sockets, you could simply fire up a second instance, feed it with your changed modules and use the sample client implementation to test them without impacting the running system.

@efficiosoft asking users to fire a daemon is not really user-friendly :-) Once a daemon is loaded, whatever change is done to any initialisation data on the disk will not affect it. Should a tool exist loading the same data each time, then changes can be tested without affecting the running daemon.

I believe @dmbaturin had the code structure in mind when he said that every command should register. The XML files for the op code are currently not unified so any code would have to parse of files after another to load them. As I pointed, following a suggestion I implemented https://github.com/vyos/vyos-1x/pull/513 so the data is now in one single file and should be soon installed on the router.

Yes, the daemon would have to have to match commands to action (route map in HTTP daemon talk). The XML code which was just integrated into the repository has a way to "walk" an in-memory dict which has the content of the XML. So the work to find the right leaf node for a command should be already resolved 🎉 The latest pending PR adds op mode XML.

For what I meant about being able to use the code without the daemon, the function which generates the data (probably a dict) from the request (probably as list of the command-line command split on space), should be something in the vyos library, so that it can be called without the daemon. It would not be as efficient (as the XML would have to be parsed at each call and as we will not be able to LRU cache the result of command which provide "static" answers), but it should be possible.

The XML code also can embed the XML files as .py file within the repository, so it is possible to use the code without even requiring the installation of the XML files. I will let have a look at the XML and repository.

@dmbaturin I am also wondering if we should also consider extending the XML to allow the registering of python modules and function. Doing so would allow adding all the vyos validators. They are much more likely to see the same input, and it would surely provide a good perf improvement if functools.lru_cache is used

@efficiosoft asking users to fire a daemon is not really user-friendly :-) Once a daemon is loaded, whatever change is done to any initialisation data on the disk will not affect it. Should a tool exist loading the same data each time, then changes can be tested without affecting the running daemon.

Well, it would be trivial to add an --auto-reload switch to the daemon for development purposes which could enable an inotify watcher over the XML files.

For what I meant about being able to use the code without the daemon, the function which generates the data (probably a dict) from the request (probably as list of the command-line command split on space), should be something in the vyos library, so that it can be called without the daemon. It would not be as efficient (as the XML would have to be parsed at each call and as we will not be able to LRU cache the result of command which provide "static" answers), but it should be possible.

Ah, I got you wrong there, definitely agree. That should be no big deal.

After looking at the XML schemata again, to me it looks like embedding the Python module rules therein would make a lot sense. In addition to <command/>, a <python/> could be introduced with the full import path like pkg.mod:callable. The same goes for <completionHelp/>, which would also benefit from a direct calling mechanism.

One thing to keep in mind however when sticking with the XML definitions are the nodes that exist twice, once as tag and once as non-tag node, as @dmbaturin mentioned.

They do allow registering different routes for /show/interfaces/ethernet/ and /show/interfaces/ethernet/<intf>, which would solve our issue with messy tag/non-tag nodes.

One such example is show history, but there are many more.

Well, it would be trivial to add an --auto-reload switch to the daemon for development purposes which could enable an inotify watcher over the XML files.

Yes, it would be a solution. However the behaviour the daemon should really be "fixed" when loaded. It may be simpler to have a reload feature on SIGHUP.
This would allow changing the file on disk when developing, testing it without breaking things, and only when confident that it is right, then asking for the reload.
While I do not advocate for this to be done in prod, I believe the workflow of many in the team is to have a local router where dev is performed and/or tested.

This is why I have implemented vyos update with https://github.com/thomas-mangin/vyos-hacker-toolkit, which automatically sync the code I am editing on my laptop with my test router.
(It does not use inotify as I am on macos which does not have the feature 😉 )

While this is a recent tool and I do not believe the rest of the team is using it, you may want to give it a look as it really helps me with my dev.

Well, it would be trivial to add an --auto-reload switch to the daemon for development purposes which could enable an inotify watcher over the XML files.

Yes, something along these lines, I had in mind to use importlib to load the code and have a convention that __main__ would call def main ().
But an entry point like definition for the function to call would work too.

the nodes that exist twice

I had not thought of it - thanks.

Dmitry added a subscriber: Dmitry.Jul 23 2020, 2:32 PM
efficiosoft added a comment.EditedJul 23 2020, 2:45 PM

Yes, it would be a solution. However the behaviour the daemon should really be "fixed" when loaded. It may be simpler to have a reload feature on SIGHUP. This would allow changing the file on disk when developing, testing it without breaking things, and only when confident that it is right, then asking for the reload.

Looks reasonable.
Cool thing, will definitely look into your vyos hacker toolkit!

But an entry point like definition for the function to call would work too.

I like this better, because it allows more than one function to be included per module which I think makes sense for simple things like "show system memory".

Regarding the schema, the more I think about it, the more I'm convinced the existing schema needs to be reworked in multiple ways. It was mostly taken from conf mode, where requirements seem to be somewhat different. Just fiddled a bit with it and came to these thoughts:

  • We should take the chance to get rid of these tag nodes, at least as they are defined right now. Tag nodes could be changed to have no name and, instead appear inside a parent node defining their name.
...
<node name="ethernet">
  <command>ip link</command>
  <children>
    <tagNode>
      <completionHelp>
        <path>interfaces ethernet</path>
      </completionHelp>
      <command>ip link show dev "{1}"</command>
    </tagNode>
  </children>
</node>

This would route "show interfaces ethernet" and "show interfaces ethernet eth0" to separate shell commands via the daemon, and if we also allow <python>show_interfaces_ethernet:ShowDeviceInfo</python> instead of <command/>, it would be much cleaner.

  • Second, is there really a need for distinction between nodes and leaf nodes in op mode?

If we'd just make any node that has a command or python property callable and stop offering sub commands in the shell at the point a node without children is reached, that would be enough information to render a proper model I think.

I played a little with the op mode RELAX NG, but just a sketch of course. Maybe we can see what @dmbaturin thinks, but I like the idea in general.

@efficiosoft are you on our slack channel ?

No, not yet. But generally I don't like it too much because I'm blind and their desktop client isn't very accessible, and writing things like code examples on a smartphone, well, is something I try to avoid ;-).

But anyway, I could join it later this evening.

@efficiosoft understood. We tend to all hangout there to help each other. If you have another way I/we can relate to you to support you (but here) let us know.

@thomas-mangin Thanks. I've joined the channel yesterday. It'll be fine for asking around every now and then.

Here is a draft of what I meant when I said reworking the XML schema.

https://github.com/efficiosoft/vycmd

Also played around a bit with the daemon/client part and communication via UNIX sockets using pynng, works good so far.

runar added a subscriber: runar.Jul 29 2020, 4:56 AM

Please consider using zeromq instead of pynng

This is because our other daemons are written using zeromq and the fact that pynng is not a part of the upstream debian source.
If we change to pynng will need to consider also rewriting our other daemons and we also need to craft our own deb package for it, as inclution into the vyos image is not as easy as a pip install.

This is because our other daemons are written using zeromq and the fact that pynng is not a part of the upstream debian source.

That the other daemons are using ZMQ doesn't necessarily mean ZMQ and NNG can't run at the same time.

I mainly see two advantages in using NNG over ZMQ:

  • It natively supports a so-called polyamorous mode which multiplexes multiple concurrent client connections on the same UNIX socket and allows bidirectional communication for all requests. ZMQ doesn't have something like this if I'm right and processing multiple requests concurrently thus won't work.
  • NNG has a native async API, which I think makes sense to use for the server, since it saves a number of I/O-bound threads per request and works great together with trio.

Other than that, it's more modern and the API feels nice to work with.

we also need to craft our own deb package for it, as inclution into the vyos image is not as easy as a pip install.

I know, but limiting dependencies to what's available in Debian rules out a few modern technologies... I'm sure I can build a deb package that includes all required Python dependencies should we need it. There are other candidates for packages that would be nice to use but aren't available in Debian as of yet.

Ok, so this now waits for T2854. I've already drafted some partial implementation and would like to base it on the architecture introduced in that task.

More thoughts still very welcome though.