I'm usually on #vyos:chat.freenode.net.
- User Since
- Nov 6 2018, 10:08 PM (86 w, 1 d)
Tue, Jun 30
Possible by backporting https://github.com/vyos/vyos-1x/pull/452 and https://github.com/vyos/vyatta-cfg-system/pull/125 though I think some code using Config would need to be modified - add .exists calls before each .return_value(s) - 1.3's vyos.config doesn't require them, 1.2's I think does.
This is already fixed in 1.3
Sat, Jun 27
Will you be doing this change or can I claim it? I'm ready to work on it right now if you prefer.
To clarify, I'm not actually sure the formatting is the reason for the bug - I just observed that it happened. It may have no impact on functionality. The real problem is that something is messing with the hw-id entries, and I don't think it's the migrator scripts, as those weren't executed when it happened. Maybe some legacy Vyatta code.
Looking at the code, there is a missing error check for when an alias is the same as a static-host-mapping or already existing alias (either of this mapping or any other one). It will require some additional iteration.
Or maybe these should be converted to warnings and not errors, as they don't cause anything to break, just not work as expected?
I agree, we can make the error message point the user towards using an alias instead. But I don't think doing so is common practice, such things are usually written in docs.
Maybe the problem was that it exposed a configuration only understood by pdns-recursor, so replacing it with something else would be impossible if we ever considered so? That was the reason for not allowing certain patches in the past.
The reason I do it this way is to comply with the 80-column limit. Any strings separated by whitespace will be transparently concatenated by Python to one string, it requires adding extra braces around the multi-line string.
No, the string concatenation is done by Python (these are not multiple arguments but one argument) so there's nothing ConfigError can do. It's simply a mistake (typo) on my part.
Yeah, only the first line will be used, the subsequent lines for an already existing IP or hostname will be ignored. pdns-recursor also behaves the same, it logs "not replacing entry" (or something similar, I can't recall) when encountering a entry with a different IP for a hostname already defined, so it behaves the same as getent. You can't add multiple entries for round-robin resolution, hosts has no support for that, if we want that we need to use zone files instead (but those only work for pdns-recursor and not the system, so are not a suitable replacement here, they could be added under the dns forwarding node)
Have you tested that this actually works as intended? The reason I added that check is that each static-host-mapping "inet" address translates to a line in /etc/hosts. The hosts manpage says there should be only one line for each host:
Fri, Jun 26
Migration scripts use vyos.configtree which uses libvyosconfig so it's probably a bug there.
Attached are config.boot post-upgrade(migration) and config.boot pre-migration.
- doubled hw-id lines
- missing opening curly braces on line 44
- some things are quoted, some are not
Not doing anything regarding the failed load and just rebooting has now hard-baked the eth0-eth2 names into config.boot without me doing anything. So something effectively decided to rename eth1-eth3 to eth0-eth2 and save it to config.boot.
Thu, Jun 25
To be fair, I'm getting frustrated by the layers-upon-layers of new abstractions getting added on old code that doesn't work properly in the first place. I'd much prefer if we just started with a clean slate. I liked @thomas-mangin 's idea of replacing vyatta-cfg completely with our own code, either vyconf or his python daemon. I wouldn't waste time patching up the old backend, just make a decision in one place to replace it completely.
I think I ran into this today after upgrading from 1.3-rolling-202006110117 to 1.3-rolling-202006241940. My config had eth1-eth3 (as those were the default names created by a previous install of 1.3 somewhere around May) and those worked fine for numerous reboots before this upgrade. The first reboot after adding the new image, everything was fine. The 2nd reboot (actually a power outage) the interfaces were eth0-eth2 on the system, but eth1-eth3 in the config, so the config load failed.
Keep in mind that the preferred way to implement scripts is, in my mind, the one used by interfaces-tunnel.py: it takes both session and effective configs and compares them, only applying the changes for the differences that are required. get_config_dict just takes the session config and so it requires a complete teardown and re-initialization of any system component that is configured from it (restarting instead of reloading services, deleting interfaces instead of modifying just 1 setting, ...)
I don't consider just get_config_dict as the preferred future way to implement features for that reason, rather it is the interfaces-tunnel that could be made the reference.
get_config_dict could be ran 2 times (once with effective=True) in each script, then the script could compare/make a diff of the 2 dicts, but that's what interfaces-tunnel ConfigurationState does.
Take for example a minor change in the current openvpn code - changing the description of the interface - currently results in a complete service outage (restart). AFAIK all scripts are like this. It didn't use to be that way in Vyatta, most perl scripts compared session and effective configs and just applied the necessary changes.
Possibly related to T2205, it might have been fixed since this was reported.
Tue, Jun 23
It would be possible to make the scripts check if IPv6 is enabled on the interface (or system?) and make the minimal MTU 1280 in that case. If IPv6 on the interface is disabled or not supported, have it go as low as it can.
This was discussed already in T2404. The problem is that NICs that expose their min/max MTU are rare. None of the NICs I have expose it, neither through sysfs nor through 'ip -d link show'. If I recap the discussion from T2404, there are 2 main ways to solve this:
a) not have any limitations regarding MTU at all and then detect an error when trying to apply the new MTU. This means no way to verify if the new mtu is correct beforehand so it doesn't comply with the verify/apply separation that's prescribed in the developer docs. I described a possible workaround using revert code in T2404.
b) have a mtu detection script that would be ran by udev on every new NIC detection (to support hotplugging NICs) that would determine the min/max mtu with a bruteforce binary search algorythm (try to set a mtu and see if it errors), then record the results in some temporary file that would get read by the config script. The idea was proposed by @thomas-mangin.
Mon, Jun 22
This would have been fixed for isc-dhcp-client if T2590 hadn't happened in the process of me working on it, now it requires writing a new dhclient script for the WIDE client.
Fixed as part of T2486
Fixed as part of T2486
The above PR419 did not fix the issue as a wrong pdns-recursor process name was used (its real name is 'pdns-rec/worker'). It was fixed as part of T2486.
That's great, I like the config syntax. What does the interface alias do regarding the display? Maybe it could be read from the interface description? Then again the display name needs to be short as the displays are small and the interface description can be longer. Maybe default to reading the alias from the interface description and override it with the display alias.
For starting services in 1.3 we use systemd, it's simple to create new service files in src/systemd that will be put in /lib/systemd/system. Just make sure they're started manually by the config script and not as part of a target (just creating a service file without the Install section should ensure that).
Fri, Jun 19
Fair point. In that case I agree with not including a raw config option.
As for the errors when installing vyos-1x, c-po already pointed out why this occurs.For this reason I don't rebase on upstream while working on a set of changes locally, I always try to keep the installed iso and local git state as much together as possible. I also run docker from the vyos-build repo and have the vyos-1x repo dir in vyos-build/packages/vyos-1x (where the included scripts/build-packages would put it) so I can just docker run and build without having to copy any files anywhere, just scp the built deb into the VM.
@c-po I tend to agree to have as much predefined templates, but I'd leave the option to have a custom config if the user wants to, I don't like imposing artificial limitations. We already allow custom options with dhcp-server, openvpn..., why not allow specifying your own conf file for the driver section to include? Some things are impossible without either this or going with approach 2 by exposing absolutely all configurable driver options through the config. I'd prefer that, but if it results in too much options/config size, the alternative is as I described. But in the long term I think approach 2 would be the best.
Thu, Jun 18
I'd go with approach 3, if 2 is too complex. Have a predefined set of appliances that can be configured by a single option. For all other scenarios, one of:
a) have the user supply the path to a file in /config that the script will include in lcdd.conf as-is (as including a multi-line string in the config directly is very awkward and unreadable).
b ) we could for example make a /config/lcdproc directory, containing a template conf file that would be used if the user selected that option in config, still starting the daemon via the config.
c) or split out the individual driver sections with defaults (as in lcdd.conf) into many files in /config/lcdproc/$drivername.conf that can be edited by the user, and have a config option that selects which driver to use, and have the user edit the file to configure it.
Wed, Jun 17
There is another use of is_tag/is_leaf in python/vyos/validate.py is_member, as it can work on both bridge and bond members, and they have different syntax for member interfaces. It would only be possible to hardcode each case and remove the use of is_*
Sun, Jun 14
This is the users dmesg with failed firmware load
[ 21.951100] iwlwifi 0000:03:00.0: can't disable ASPM; OS doesn't have ASPM control [ 21.951932] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-6000-6.ucode failed with error -2 [ 21.951961] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-6000-5.ucode failed with error -2 [ 21.951982] iwlwifi 0000:03:00.0: Direct firmware load for iwlwifi-6000-4.ucode failed with error -2 [ 21.951985] iwlwifi 0000:03:00.0: no suitable firmware found! [ 21.958303] iwlwifi 0000:03:00.0: minimum version required: iwlwifi-6000-4 [ 21.967817] iwlwifi 0000:03:00.0: maximum version supported: iwlwifi-6000-6 [ 21.975992] iwlwifi 0000:03:00.0: check git://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
I suggested to install the firmware-iwlwifi package from the debian repos as part of the debugging process. Of course this isn't a supported way for end users, this is known for a long time, and you must take care to not run apt-get upgrade, dist-upgrade or install a package that conflicts or overrides a vyos package. It is safe to install packages that don't conflict.
Non-free firmware was added by default in crux: T15
Fri, Jun 12
I have no need for PD for now, so this isn't an important issue for me. I just noticed that WIDE didn't run any scripts, so right now it can't set any nameservers obtained from DHCP. If anyone needs that, I guess it would be simplest to write a script (by using the existing dhclient-script hooks as a guide) just for vyos-hostsd, since PD is already done with WIDE. Switching to ISC would mean we'd need to improve that PD script I linked to, since it only supports a single interface, and we need multiple.
@jack9603301 you're not the one that made the choice so you can't know why it was made.
ISC-DHCP can do prefix delegation too (not by itself, but with a helper script that others already made: https://wiki.debian.org/IPv6PrefixDelegation ) so that's not why WIDE was chosen.
What was the reason for choosing WIDE dhcp6c and not keeping isc-dhcp? This has now caused T2590 which will require making a new set of dhclient scripts just for WIDE, so we'll be maintaining 2 separate scripts. If it was due to the support for prefix length hint, isc-dhcp has added that too, as I mentioned in this task before https://phabricator.vyos.net/T421#49842
Thu, Jun 11
Wed, Jun 10
+1 for this, it would be very useful for a lot of use cases, we wouldn't need to add everything to vyos-1x and the config syntax, but users could add "missing" services on their own.
Mon, Jun 8
This has been on my extended todo list for a long time now, but there were always higher priority issues. I'm glad someone is working on it!
Sun, Jun 7
@mrozentsvayg look at https://community.openvpn.net/openvpn/ticket/360 (this is documented in the code comment right above your change as well). OpenVPN on Linux in server mode with standard protocols doesn't listen on IPv6, just IPv4. We need to force it to bind to a IPv6 socket using these undocumented *6 protocols, then it'll listen on both IPv4 and IPv6. This wouldn't be necessary if OpenVPN listened on IPv6 with the default protocols or autodetected whether the local or remote IPs are v4 or v6 and chose the correct socket type, but it doesn't. Complain to the above ticket 360. We're just working within the limitations imposed by OpenVPN.
Sat, Jun 6
Can you ask the user if you want to start the migration failure fallback mechanism on the first boot of the new image when upgrading, and if the user chooses to enable this mechanism, you should let the user select an old secure image (execute the mechanism only on the first boot)?
I think 'configure; load; commit' is important to make debugging easier and faster. There are issues with the vyos-config-debug method: it needs a full reboot to test every change, which can take minutes (and one may not fix the bug in the first 10 tries even, depending on how sleep deprived one is) and it lacks a easy way to see the scripts stdout/stderr (it is discarded unless we enable airbag's debug log, which is yet another thing to have in mind) as the standard traceback that's logged to /tmp may not be enough to catch the exact error and we need to print some variables to look at them or something like that. But mainly the issue is rebooting is much slower than just doing load/commit.
Can't we just read config.boot into the session config of configtree or config? Wasn't that exactly what was done before? I'm 100% there was a function that read the config.boot file in config.py in case the config system wasn't initialised.
Right, that's obvious. The issue is that we need to know to *which* image to switch to, but it's easy to solve, as I'll describe.
Ah right, then all the things about replacing config.boot aren't necessary. I was thinking that /config was permanent between images, I don't know how I forgot that it lives inside each image separately.
Sorry, I don't know if I understand it wrong, but please allow me to express my opinion, but when you install and add a new image from the old image for upgrading, it may occur that it can't be used normally at startup config.boot Configuration (especially rolling update),
Please test with the latest 1.3 rolling image if the bug is now fixed.
Fri, Jun 5
@thomas-mangin that's great if the POC already has the above - I'm on board with making it the replacement for vbash in that case. I can live without word jumping and line deletion (for now) if it has tab completion and history (I will miss reverse history search a lot as it doesn't search just the beginning of the line but the whole line for the pattern, I'm assuming prompt-toolkit just searches the beginning?). I do need to test it when I get some time to see if anything is still missing.
To add my 2 cents:
Another possibility would be to modify the VTunIf's bridgeable parameter when creating it. That wouldn't require a different name for tap and all the config migration that comes with it, but I don't know if it's possible.
I also don't think we can find out whether a interface is tun or tap from the up/down script (no parameter or environment variable seems to tell it this) so it'd be good if tun and tap had different names due to this too. It would require migrating the name of existing interfaces (from vtun to vtap if tunnel type is tap) under interfaces openvpn, interfaces bridge member, service router-advert, firewall, nat, possibly other places as well.
What happens when one remote-host is IPv4 and one is IPv6? The proposed fix would leave the protocol as udp6 in that case and the error would still be there.
duplicate of T2468
Writing the config parsing, editing, committing and saving shouldn't be that hard.
Re-implementing completion with the current set of features would be very hard.
A workaround would be to simply call interfaces-ethernet.py from the bond script itself after it removes it from the bond.
The consequence of migrating bond member config from the interface node to the bond node, the same problem exists for bridges as well.
Also regarding bridging: the current VTunIf says vtun is bridgeable. I doubt bridging a tun interface is wanted as that's a purely L3 tunnel. As the bridge member config syntax places the members under the bridge interface, the bridge interface determines if a interface is bridgeable by looking at its class definition. Thus to make openvpn in tun mode not bridgeable and tap mode bridgeable, those would need to be 2 different classes with different interface names ('vtun' and 'vtap'?). A hackish way is possible by making the bridge code check the openvpn config directly, but I highly dislike hackish solutions. Even T2241 was a 'hackish' solution that was necessary due to a previous bridge syntax migration without thinking about the consequences of it (moving the bridge member config under the bridge code makes syntactical sense, but it requires hackish workarounds like T2241 with the curernt way the config system operates)
Looking at the above errors:
Indeed, I didn't test client mode with the IPv6 patch, I assumed openvpn would use 'proto' for the listening socket only and not for the client socket (since it could detect which family the remote-host address is, it could select the correct socket, but it honors the 'proto' in the config) so my assumption was wrong. I appreciate the help.
Tue, Jun 2
A significant part of the old config system is the bash-completion integration as well. I assume this is not integrated with bash but is a separate console that you start and takes over all stdin/stdout? Is it possible to implement the same completion output as there is now?