Page MenuHomeVyOS Platform

Losing default route after first reboot (cloud-init & DHCP)
Open, HighPublicBUG

Assigned To
None
Authored By
thannaske
Feb 12 2024, 6:30 PM
Referenced Files
Restricted File
Feb 12 2024, 6:30 PM
Restricted File
Feb 12 2024, 6:30 PM
Restricted File
Feb 12 2024, 6:30 PM

Description

Description
When rebooting a VyOS instance that has been bootstrapped with cloud-init and configured for DHCP on eth0, the IPv4 default route gets lost after the first reboot. IPv6 is constantly working fine.

Steps to reproduce

  1. Download or build the latest ISO for the 1.4 or 1.5 release (reproduced it with both versions)
  2. Using the vyos-vm-images Docker environment, build a RAW disk image with the following command:
export ISO_URL=http://example.com/vyos.iso

ansible-playbook raw.yml \
-e vyos_iso_url=ISO_URL \
-e disk_size=5 \
-e cloud_init=true \
-e cloud_init_ds=NoCloud,None \
-e keep_user=false \
-e grub_console=kvm \
-e enable_dhcp=true \
-e enable_ssh=true \
-e without_login=true
  1. Boot a virtual machine with a single network adapter eth0 using the built disk image. Make sure the machine is able to get an IPv4 address via DHCP.
  2. Once VyOS is up and running verify that IPv4 connectivity is given and reboot immediately
  3. Verify that VyOS has lost its default route and there's no IPv4 connectivity anymore

Expected outcome
As the configuration hasn't changed, it's expected that the VyOS behaves deterministic and completely identical and does not lose its default gateway.

Actual outcome

  1. Immediately after the first boot (so before the actual reboot) everything looks normal: There's a default IPv4 route and you can ping outside targets.
admin@vyos:~$ ip r
default nhid 6 via 91.xxx.219.17 dev eth0 proto static metric 20
91.xxx.219.16/28 dev eth0 proto kernel scope link src 91.xxx.219.24

admin@vyos:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

S>* 0.0.0.0/0 [210/0] via 91.xxx.219.17, eth0, weight 1, 00:02:13
C>* 91.xxx.219.16/28 is directly connected, eth0, 00:02:13

admin@vyos:~$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=1.76 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=60 time=1.53 ms
^C
--- 1.1.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.528/1.646/1.764/0.118 ms

[edit]
admin@vyos# show interfaces
 ethernet eth0 {
     address dhcp
     hw-id 52:54:00:47:61:2b
     mtu 1500
     offload {
         gro
         gso
         sg
         tso
     }
 }
 loopback lo {
 }

[edit]
admin@vyos# run show interfaces
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address        MAC                VRF        MTU  S/L    Description
-----------  ----------------  -----------------  -------  -----  -----  -------------
eth0         91.xxx.219.24/28  52:54:00:47:61:2b  default   1500  u/u
lo           127.0.0.1/8       00:00:00:00:00:00  default  65536  u/u
             ::1/128
  1. Then, without changing, committing or saving anything to the existing config, we reboot the VyOS.
  2. While booting, the VyOS gains connectivity for a short time (you can ping it)
  3. Upon the following lines in the boot console output the VyOS loses its connectivity again
35.595944 vyos-router [1334]: Mounting VyOS Config...done.
47.496305 vyos-router [1334]: Starting VyOS router: migrate configure.
47.559938 vyos-config [1348]: Configuration success
  1. When logging in (IPv6 does still work), the command outputs posted above differ a bit:
admin@vyos:~$ ping 1.1.11
/bin/ping: connect: Network is unreachable
admin@vyos:~$ ip r
91.xxx.219.16/28 dev eth0 proto kernel scope link src 91.xxx.219.24

admin@vyos:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

S   0.0.0.0/0 [210/0] via 91.xx.219.17, eth0, weight 1, 00:11:09
K>* 0.0.0.0/0 [0/210] via 91.xxx.219.17, eth0, 00:11:18
C>* 91.xxx.219.16/28 is directly connected, eth0, 00:11:09

First thing to notice here is the difference between ip r and show ip route. While ip r now shows no default route whatsoever, show ip route suddenly has two default routes (one static, one kernel route). Connectivity however is completely lost:

admin@vyos:~$ ping 1.1.1.1
/bin/ping: connect: Network is unreachable

The configuration still looks the same and the eth0 interface still has its IPv4 address correctly obtained via DHCP:

[edit]
admin@vyos# show interfaces
 ethernet eth0 {
     address dhcp
     hw-id 52:54:00:47:61:2b
     mtu 1500
     offload {
         gro
         gso
         sg
         tso
     }
 }
 loopback lo {
 }

[edit]
admin@vyos# run show interfaces
Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down
Interface    IP Address                          MAC                VRF        MTU  S/L    Description
-----------  ----------------------------------  -----------------  -------  -----  -----  -------------
eth0         91.xxx.219.24/28                    52:54:00:47:61:2b  default   1500  u/u
lo           127.0.0.1/8                         00:00:00:00:00:00  default  65536  u/u
             ::1/128

To fix this, I can manually add the route using ip r:

admin@vyos:~$ sudo ip r add 0.0.0.0/0 via 91.xxx.219.17

admin@vyos:~$ ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=2.59 ms
^C
--- 1.1.1.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 2.589/2.589/2.589/0.000 ms

admin@vyos:~$ ip r
default via 91.xxx.219.17 dev eth0
91.xxx.219.16/28 dev eth0 proto kernel scope link src 91.xxx.219.24

To make things worse, everything works again, but show ip route now has a whopping three default IPv4 routes:

admin@vyos:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/0] via 91.xxx.219.17, eth0, 00:00:49
S   0.0.0.0/0 [210/0] via 91.xxx.219.17, eth0, weight 1, 00:16:58
K * 0.0.0.0/0 [0/210] via 91.xxx.219.17, eth0, 00:17:07
C>* 91.xxx.219.16/28 is directly connected, eth0, 00:16:58

Configs
The following cloud-init meta-data is being used:

#cloud-config
local-hostname: vyos
hostname: vyos

The following cloud-init user-data is being used:

#cloud-config
hostname: vyos
fqdn: vyos
users:
  - name: admin
    passwd: '...'
vyos_config_commands:
  - set system host-name 'vyos'
  - set service ntp server 1.pool.ntp.org
  - set service ntp server 2.pool.ntp.org
  - set interfaces ethernet 'eth0' description 'WAN'
  - set interfaces ethernet 'eth0' ipv6 address autoconf
  - set interfaces ethernet 'eth0' ipv6 address eui64 '...'
  - commit
  - save

Versions

admin@vyos:~$ show version
Version:          VyOS 1.4-rolling-202402121109
Release train:    sagitta

Built on:         Mon 12 Feb 2024 11:09 UTC
Build UUID:       feb4342e-fb0d-45ab-a0ce-aa04b8ec25b9
Build commit ID:  97481cb8ba4011

Architecture:     x86_64
Boot via:         installed image
System type:      KVM guest

Hardware vendor:  QEMU
Hardware model:   Standard PC (i440FX + PIIX, 1996)
Hardware S/N:     ds=nocloud-net;s=http://my-cloud-init-service/cloud-init/
Hardware UUID:    ab4108e6-a657-402f-8f3b-2287829b3440

Copyright:        VyOS maintainers and contributors

(The exactly same is happening with the latest rolling VyOS 1.5 so I omit the redundant information here)

Logs
I'm attaching the journalctl log of the boot process for the first reboot (the boot after the default GW doesn't work anymore), the cloud-init.log as well as boot.config.

{F4190659}
{F4190658}
{F4190657}

If you need any further information, please let me know.

Details

Difficulty level
Unknown (require assessment)
Version
1.5
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)