On my relatively fast 1-core VM on a 2.5GHz i7 processor it takes 4 seconds to list 17 bridgeable interfaces (member interface). I can't imagine how slow it would be on a slower processor like a Atom or Geode in the APU boards (probably faster if you don't create a lot of VLANs, but each VLAN is a bridgeable interface that adds to the list).
- Difficulty level
- Unknown (require assessment)
- Why the issue appeared?
- Will be filled on close
- Is it a breaking change?
- Perfectly compatible
It takes 2s even with just --help. I can't explain why it takes 2s by itself, but the completion takes 4s. 2s must be added elsewhere.
vyos@vyos# time /usr/libexec/vyos/completion/list_interfaces.py --help usage: list_interfaces.py [-h] [-t TYPE | -b | -br | -bo] optional arguments: -h, --help show this help message and exit -t TYPE, --type TYPE List interfaces of specific type -b, --broadcast List all broadcast interfaces -br, --bridgeable List all bridgeable interfaces -bo, --bondable List all bondable interfaces real 0m1.959s user 0m0.373s sys 0m0.501s  vyos@vyos# time /usr/libexec/vyos/completion/list_interfaces.py --bridgeable bond0 bond0.40 bond0.42 bond0.110 bond0.111 bond0.112 bond0.113 bond0.115 bond0.116 bond0.117 bond0.118 eth0 eth1 eth2 eth3 eth4 lo real 0m1.949s user 0m0.361s sys 0m0.504s 
On a dual-core Pentium E5300: 2.2s, time for completion 5s.
vyos@rt-home# time /usr/libexec/vyos/completion/list_interfaces.py --help usage: list_interfaces.py [-h] [-t TYPE | -b | -br | -bo] optional arguments: -h, --help show this help message and exit -t TYPE, --type TYPE List interfaces of specific type -b, --broadcast List all broadcast interfaces -br, --bridgeable List all bridgeable interfaces -bo, --bondable List all bondable interfaces real 0m2.238s user 0m0.599s sys 0m0.652s 
The recent implementation here uses the python ifconfig module and walks it to detect interfaces marked as beidgable. I found such constructs are way too slow, simply listing ls -1 /sys/class/net (and do some filtering) is magnitudes faster.
The recent implementation comes from Thomas mangin
I tried to get a flamegraph showing what I was wanting to say but .. do not look very clear :-(
Getting the interface themselves is fast. We are also using netifaces.interfaces and a few wrapping class for the data but it is an iteration over some memory structures. This is still just fast.
Previously some interfaces which were bridgeable were not listed and forgotten when added. This can not happen anymore as the logic is data-driven and not code-driven. I see this as a clear feature and definitively not a bug.
What is slow is the initialisation of the whole vyos library. This new code went from a few if/else to import most of the code in python/vyos. The more code is moved to python, the longer this "one-shot" time will become.
This startup time cost can not be avoided and it is already visible with the running of multiple configurators. For each interface which needs to be setup, all the setup must be done over and over again.
On my laptop it takes nearly a second to import all the code in src/conf. This has nothing to do with what I wrote. No code is run, It is just that you are now starting to have a sizeable chunk of python code to load and parse at boot time and therefore are now noticing this one-time setup. Each time a new library from a third party is added (which will be required to add the features you need), this startup time will increase, and this even before we talk about moving BGP and the rest of the vyatta code in.
This is why I started to work on T2433 and this is also related to T2088. Have one long-running process, which may take most likely over a second to load but then can provide blazing fast call to the function, while not affecting the code maintainability looked to me like a good way forward.
But I felt that the team is not behind this idea so I stopped to work on it ATM. So what you have is a side effect of the fork design.
If op command needs to be faster you can drop all references to the code in python/vyos and duplicate all the feature/work, you can rewrite everyting in OCaml or C like was done for the validators, or you can consider T2433 as a long term solution to the issue.
The team can as it like but I am causing the slow down to the code. I am just trying to integrate all the code to make it maintainable.
So in short:
The slow down is a consequence of the increase of the python code base.
I will happy to resume work on T2433 if/when you agree with this analysis.
@dmbaturin your call to make. At this point, I can not help any more, until you/then team makes an executive decision on the direction you want the project wants to take: more integrated python and probably some form of services or keep the fork design start to strip the python code down. Obviously, if you can think of something better, fantastic: I will have then learned something.
vyos@vyos# diff -u /usr/libexec/vyos/completion/list_interfaces.py list_interfaces.py --- /usr/libexec/vyos/completion/list_interfaces.py 2020-03-21 19:47:22.000000000 +0000 +++ list_interfaces.py 2020-05-30 18:45:30.564000000 +0000 @@ -39,4 +39,7 @@ print(" ".join([intf for intf in matching("bondable") if '.' not in intf])) else: - print(" ".join(Section.interfaces())) + import timeit + def do(): + print(" ".join(Section.interfaces())) + print("time to get the inteface: " + str(timeit.timeit(stmt=do,number=1)))
vyos@vyos# time ./list_interfaces.py lo eth0 eth1 eth2 eth3 dum0 dum2 dum6 dum14 dum30 dum62 dum99 dum98 dum97 dum96 dum95 dum94 dum93 dum92 dum91 dum90 dum89 dum88 dum87 dum86 dum85 dum84 dum83 dum82 dum81 dum80 dum79 dum78 dum77 dum76 dum75 dum74 dum73 dum72 dum71 dum70 dum69 dum68 dum67 dum66 dum65 dum64 dum61 dum63 dum60 dum59 dum58 dum57 dum56 dum55 dum54 dum53 dum52 dum51 dum50 dum49 dum48 dum47 dum46 dum45 dum44 dum43 dum42 dum41 dum40 dum39 dum38 dum37 dum36 dum35 dum34 dum33 dum32 dum29 dum31 dum28 dum27 dum26 dum25 dum24 dum23 dum22 dum21 dum20 dum19 dum18 dum17 dum16 dum13 dum15 dum12 dum11 dum10 dum9 dum8 dum5 dum7 dum4 dum1 dum3 time to get the inteface: 0.0009720910002215533 real 0m0.224s user 0m0.110s sys 0m0.052s
So 0.386 in the program, 0.0009 in running the function, large startup overhead. @jjakob could you please apply this and tell me what you get please.
@thomas-mangin I think there was a misunderstanding between us. The disagreement we had regarding the way to implement vyatta-cfg validators was because the validators are a integral part of vyatta-cfg operation. They are also simple and small as they only need to validate the types and constraints of config nodes. As they are tied to vyatta-cfg closely, which operates by executing a new process for each config node, that execution needs to be very fast. I was against your solution (a validator daemon in Python listening on a socket file and a companion client in a language that's faster to start up) just because it seemed needlessly complicated for what it needs to achieve. Node validation in vyatta-cfg is a case of simple constraints, not complex interdependencies that would require a higher level language. As we later do the complex validation in the configuration scripts that are written in python themselves, all the complexity can already be put there. Now you may be wondering why this validation is done in two places, it's because of the legacy of vyatta-cfg. In the old days of vyatta, many config nodes didn't have corresponding scripts at all, they were self contained and applied the config directly using system utilities and simple shell scripts that were part of the node definitions themselves. In that case, the config node validators were the only validation of a value that was done and each config node coould specify their own shell snippet or script to validate its own value. This made sense in that design concept.
It is also still an integral part of the shell environment: in config mode, a set command with a invalid value will return an error immediately as its validator returns an error. The configuration script can catch an error only when a commit is triggered.
Now that we are tacking a completely different design concept onto that, things become complex. If the new design says: "all new code must be python" but since we're marrying this new code with the old vyatta-cfg core (vyatta-cfg is still the heart and core of VyOS with Python being the "worker"), things will become very unoptimal and complex and bizarre in some places that wouldn't need to be that way and could be left simple. The above being an example of this complexity due to a design choice.
Now, regarding how best to proceed: if we won't be replacing vyatta-cfg soon, we have to live with it. And since the choice of Python seems set in stone, we have to live with Python's slow initialization time. I agree with your concept of a daemon that would run all the python code and would have a fast "dispatcher" that communicated with it over a socket. It's the best we can do to speed up this mess.
It's not without its downsides, including having to secure the socket against unauthorized clients (at least filesystem permissions and user groups). The dispatcher from vyatta-cfg needs to be in a fast, most likely compiled language.
If we moved everything to this new daemon, it would make sense to dispatch validation events to it too.
I also couldn't immediately grasp how your suggestion would fit into the whole design concept of VyOS - now that I've had some time to think about it, it indeed makes sense. I was having problems seeing how having a complex daemon just for validators would make sense while still leaving everything else as is - it makes sense if everything is moved there. And only because of the current state of having to "marry" vyatta-cfg with Python and Python's slow initialization times. I guess in the future, if VyConf gets to a place where it can replace vyatta-cfg, it would still make sense to run everything in a single daemon.
Thank you for this long answer @jjakob. I want to demonstrate that a full python solution can provide the performance we need. I appreciate that changing the Vyatta code need to be done carefully with many consideration about backward compatibility. What I am doing is surely 1.4 material. However I do not believe this is as hard to achieve as everyone may think, and as working code is the best way to discuss code design, that is what I am doing.
I gave an update on T2522 so you can see what I have in mind. All the code is public on my vyos-1x repo and working. I had misunderstood the scope of VyConf ... and ended up re-implementing a big part of it already in python :-(. I will have a look at it again to see what I am missing - taking all the good ideas :-)