Page MenuHomeVyOS Platform

Some Cloud-Init configurations can prevent login on the router
Confirmed, Requires assessmentPublicBUG

Description

SUMMARY

The way VyOS interprets the Cloud-Init network-config, some configurations may lead to the boot configuration being broken and uncommittable.

STEPS TO REPRODUCE

Boot the router with a Cloud-Init configuration containing the following network-config (and yes, the second static stanza should rather be static6, that's a bug in my hypervisor but it doesn't matter much):

version: 1
config:
    - type: physical
      name: eth0
      mac_address: '00:12:34:56:78:9a'
      subnets:
      - type: static
        address: '10.0.2.21'
        netmask: '255.255.255.0'
        gateway: '10.0.2.2'
      - type: static
        address: 'fec0:de:ad:f00d::1/64'
        gateway: 'fec0:de:ad:f00d::fffe'
    - type: nameserver
      address:
      - '8.8.8.8'
      search:
      - 'example.com'

OBSERVED RESULT

Confiiguration commit fails. User is unable to login.

EXPECTED RESULT

Network configuration is applied. User can login.

SOFTWARE/OS VERSIONS

vyos-cloud-init @ 393cc322629604843b98da970b0761965a7a268e

ADDITIONAL INFORMATION

This is an issue in set_config_interfaces_v1(). The following code is wrong:

if subnet['type'] in ['static', 'static6']:
     # ... snip ...

     # configure gateway
     if 'gateway' in subnet and subnet['gateway'] != '0.0.0.0':
         logger.debug("Configuring gateway for {}: {}".format(iface_name, subnet['gateway']))
         config.set(['protocols', 'static', 'route', '0.0.0.0/0', 'next-hop'], value=subnet['gateway'], replace=True)
         config.set_tag(['protocols', 'static', 'route'])
         config.set_tag(['protocols', 'static', 'route', '0.0.0.0/0', 'next-hop'])

With the above configuration, this will try to add a route for 0.0.0.0/0 even for the IPv6 gateway, resulting in a later failure to mount the configuration correctly. A simple fix would be to write it like this:

# configure gateway
 if ip_version == 4 and 'gateway' in subnet and subnet['gateway'] != '0.0.0.0':
     logger.debug("Configuring gateway for {}: {}".format(iface_name, subnet['gateway']))
     config.delete(['protocols', 'static', 'route', '0.0.0.0/0'])
     config.set(['protocols', 'static', 'route', '0.0.0.0/0', 'next-hop'], value=subnet['gateway'], replace=True)
     config.set_tag(['protocols', 'static', 'route'])
     config.set_tag(['protocols', 'static', 'route', '0.0.0.0/0', 'next-hop'])
 if ip_version == 6 and 'gateway' in subnet and subnet['gateway'] != '::':
     logger.debug("Configuring gateway for {}: {}".format(iface_name, subnet['gateway']))
     config.delete(['protocols', 'static', 'route6', '::/0'])
     config.set(['protocols', 'static', 'route6', '::/0', 'next-hop'], value=subnet['gateway'], replace=True)
     config.set_tag(['protocols', 'static', 'route6'])
     config.set_tag(['protocols', 'static', 'route6', '::/0', 'next-hop'])

Details

Difficulty level
Easy (less than an hour)
Version
1.4-rolling-202102180218
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible

Event Timeline

zsdc added a subscriber: zsdc.

Hello, @wsapplegate!

Can you share details about your hypervisor and datasource? Also as the full Cloud-init log (/var/log/cloud-init.log)?
Either datasource generates a wrong config, either the format is not well described in the Cloud-init documentation - there noted that: "gateway: IPv4 address of the default gateway for this subnet". I more believe in the wrong documentation, but would be better to check.
Independently of this all, the situation is not good, because we need to verify values that put into config. So, this will be fixed in one or another way (proper adding or drop), when we figure out details.

In T3338#87652, @zsdc wrote:

Can you share details about your hypervisor and datasource? Also as the full Cloud-init log (/var/log/cloud-init.log)?

Hypervisor is ProxMox VE 6.2. Datasource is PVE's integrated datasource (NoCloud). I attach the CI log here:

Either datasource generates a wrong config, either the format is not well described in the Cloud-init documentation

Well, the config is obviously wrong (violates the spec) in any case because the docs clearly say that subnet type for an IPv6 address should be static6. But as the late Jon Postel famously wrote in RFC 761, you should be conservative in what you do and liberal in what you accept :-) Anyway, the code already manages to detect v4 vs. v6 addresses, so even a broken config is parsable.

I also suspect that, given that Cloud-Init's own eni network config writer spits out a valid /etc/network/interfaces file with a similar config.

Independently of this all, the situation is not good, because we need to verify values that put into config. So, this will be fixed in one or another way (proper adding or drop), when we figure out details.

Well, it's maybe not as bad as I first thought: I just rechecked and I can't reproduce the “cannot login” part on the latest qcow2 image (VyOS 1.3-rolling-202101), the config gets correctly applied, not sure why I got that behaviour the first time, maybe that was some error in my own build. The main remaining issue is that the 0.0.0.0/0 route gets clobbered by Cloud-Init:

vyos@bugtest-gw:~$ sh conf c | match route
set protocols static route 0.0.0.0/0 next-hop fec0:de:ad:f00d::fffe

And this results in a defaultless route table:

vyos@bugtest-gw:~$ sh ip ro 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

C>* 10.0.2.0/24 is directly connected, eth0, 00:21:42
zsdc changed the task status from Open to Confirmed.Feb 19 2021, 11:29 PM

I would like to solve this in the next way. I will:

  1. Add verification to our config module to avoid impossible configurations.
  2. Add IPv6 gateway processing (how could I miss this? Cannot imagine...).

And it is necessary to leave a bug-report on the Proxmox bug tracker to lead this to the logical end. Could you do this?

In T3338#87770, @zsdc wrote:

And it is necessary to leave a bug-report on the Proxmox bug tracker to lead this to the logical end. Could you do this?

Here you go: Bug #3314