Page MenuHomeVyOS Platform

Increase performance using unix socket
Open, Requires assessmentPublicFEATURE REQUEST

Description

Currently, there is a high cost to use python for the validators. The python interpreter must be forked at each test and this is not cheap:

vyos@vyos# time bash -c ""

real	0m0.001s
user	0m0.001s
sys	0m0.000s
[edit]
vyos@vyos# time python3 -c ""

real	0m0.019s
user	0m0.011s
sys	0m0.007s
[edit]
vyos@vyos# time python3 -c "import vyos.ifconfig"

real	0m0.173s
user	0m0.104s
sys	0m0.031s

So the bash code is 100-200x time faster than the python per call. However, even that code is not as fast as some compiled C would be as it forks and exec some other application.

Python performance is mostly "good enough" for most cases when the interpreter is running. If the python code is running in long-lived processes, then the initialisation impact of python becomes a moot point.

For example, using Unix socket, and a simple example from https://pymotw.com/3/socket/uds.html, (a simple echo program) gives the following results:

vyos@vyos:~$ time bash -c 'echo "test" | nc -U uds_socket | head -1'
test

real	0m0.004s
user	0m0.003s
sys	0m0.000s

This includes forking shell, nc and head and using echo, even so the performance is near bash.

So why would we want to use a long-lived python process over using C/OCaml/Bash for validation/
1 - it will keep the number of languages down in the project. Python is a widely known language.
2 - it will allow using the same validation code between the config / operational and validation code allowing consistency
3 - the C code could be adapted to use the Unix pipe, saving an expensive fork
4 - long lived code will let to other optimisation
5 - the framework set will also be available for configuration of operational mode, where the tools are more complex and better written in python

For the optimisation(4), for exmple, templating is currently be used to generate the configuration file for thrid party application. Like for fork, the inialisation is high and having long-lived program would improve performances.

I believe (5) is, however, the most compelling point, as the long term direction of the project should be considered. Currently, the XML is used to generate some files, then used by some C code ... It would make sense to have the XML being used by the same python code used for the configuration. Once all the logic is moved within the Python, this becomes possible. Also, possibly removing the need to even run as a daemon, as no forking will be required for anything and the initialisation cost may be acceptable. Some other feature also become available but this becomes off-topic (not performance-related).

The project already uses multiple languages and not all contributors are fluent in them all. I can count Perl, XML, Shell, C, Python, OCaml (and surely a few DSL). Python is likely to be the most known programming language by likely contributors.

Having all the code under python also open other options such as using entry-points to generate single applications for each of the validation.

I propose to use this ticket to:

  • discuss the pro and cons of all the approach
  • share numbers and performance about the different solutions for objective decision making
  • but leave out the other thing possible and instead use T2407

Some part of discussion already occurred in:

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

pasik added a subscriber: pasik.May 8 2020, 9:34 AM

I have implemented a "validator program" which is an entry point which will locate a named python program and run it. It uses the import mechanism of python at startup so the setup time is very high.

The same code was then used to create a unix socket daemon to evaluate performances. The daemon can be started using:

unix_daemon -v

The repository can be downloaded from:
https://github.com/thomas-mangin/vyos-1x/tree/T2433

1x test:

Calling the old/removed numeric.py python validator code once (same code different location)

vyos@vyos:~$ time python3 /usr/lib/python3/dist-packages/vyos/validators/numeric.py --positive 1

real	0m0.025s
user	0m0.017s
sys	0m0.008s

Calling the same code via a dispatcher

vyos@vyos:~$ time validator numeric --positive 1

real	0m0.106s
user	0m0.091s
sys	0m0.014s

(0.106+0.091+0.014)/(0.025+0.017+0.008)

= 4.22

So the dispatcher has a setup time which causes the program 4/5x time slower. The reason is the time it takes to load/parse the code from the vyos repo used (and a bit of dispatch), which is not used otherwise by the single numeric. So we can see this as the impact of loading a "larg'ish" library in python.

1000x time test:

validator code

vyos@vyos:~$ time sudo sh -c ' for ((n=0;n<1000;n++)); do validator numeric --positive 1; done'

real	1m38.290s
user	1m21.231s
sys	0m16.103s

That's painful!

numeric.py ython code

vyos@vyos:~$ time sudo sh -c ' for ((n=0;n<1000;n++)); do python3 /usr/lib/python3/dist-packages/vyos/validators/numeric.py --positive 1; done'

real	0m17.268s
user	0m14.894s
sys	0m2.250s

Using OCaml numeric

vyos@vyos:~$ time sudo sh -c ' for ((n=0;n<1000;n++)); do /usr/libexec/vyos/validators/numeric --positive 1; done'

real	0m0.583s
user	0m0.496s
sys	0m0.081s

(17.268+14.894+2.250)/(0.583+0.496+0.081)

= 29

So on my router, Ocaml is 29x faster than the python code (which matches the other test done by others).

Using Python numeric.py "unmodified" via unix socket server magic, to remove the cost of setup of python.

vyos@vyos:~$ time sudo sh -c ' for ((n=0;n<1000;n++)); do echo numeric --positive 1 | nc -U ./validator.socket > /dev/null; done'

real	0m1.304s
user	0m0.569s
sys	0m0.130s

(1.304+0.569+0.130)/(0.583+0.496+0.081)

= 1.72

So using python via a daemon is still slower than OCaml BUT:

  • no attempt to optimise parsing of command line (still use argparse) and the complex current syntax
  • it includes the cost of nc which could be saved from the C calling code

So I would argue that it should be possible to use Python code for all the checks and make it as fast as OCaml