Description
When rebooting a VyOS instance that has been bootstrapped with cloud-init and configured for DHCP on eth0, the IPv4 default route gets lost after the first reboot. IPv6 is constantly working fine.
Steps to reproduce
- Download or build the latest ISO for the 1.4 or 1.5 release (reproduced it with both versions)
- Using the vyos-vm-images Docker environment, build a RAW disk image with the following command:
export ISO_URL=http://example.com/vyos.iso ansible-playbook raw.yml \ -e vyos_iso_url=ISO_URL \ -e disk_size=5 \ -e cloud_init=true \ -e cloud_init_ds=NoCloud,None \ -e keep_user=false \ -e grub_console=kvm \ -e enable_dhcp=true \ -e enable_ssh=true \ -e without_login=true
- Boot a virtual machine with a single network adapter eth0 using the built disk image. Make sure the machine is able to get an IPv4 address via DHCP.
- Once VyOS is up and running verify that IPv4 connectivity is given and reboot immediately
- Verify that VyOS has lost its default route and there's no IPv4 connectivity anymore
Expected outcome
As the configuration hasn't changed, it's expected that the VyOS behaves deterministic and completely identical and does not lose its default gateway.
Actual outcome
- Immediately after the first boot (so before the actual reboot) everything looks normal: There's a default IPv4 route and you can ping outside targets.
admin@vyos:~$ ip r default nhid 6 via 91.xxx.219.17 dev eth0 proto static metric 20 91.xxx.219.16/28 dev eth0 proto kernel scope link src 91.xxx.219.24 admin@vyos:~$ show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure S>* 0.0.0.0/0 [210/0] via 91.xxx.219.17, eth0, weight 1, 00:02:13 C>* 91.xxx.219.16/28 is directly connected, eth0, 00:02:13 admin@vyos:~$ ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=1.76 ms 64 bytes from 1.1.1.1: icmp_seq=2 ttl=60 time=1.53 ms ^C --- 1.1.1.1 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev = 1.528/1.646/1.764/0.118 ms [edit] admin@vyos# show interfaces ethernet eth0 { address dhcp hw-id 52:54:00:47:61:2b mtu 1500 offload { gro gso sg tso } } loopback lo { } [edit] admin@vyos# run show interfaces Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down Interface IP Address MAC VRF MTU S/L Description ----------- ---------------- ----------------- ------- ----- ----- ------------- eth0 91.xxx.219.24/28 52:54:00:47:61:2b default 1500 u/u lo 127.0.0.1/8 00:00:00:00:00:00 default 65536 u/u ::1/128
- Then, without changing, committing or saving anything to the existing config, we reboot the VyOS.
- While booting, the VyOS gains connectivity for a short time (you can ping it)
- Upon the following lines in the boot console output the VyOS loses its connectivity again
35.595944 vyos-router [1334]: Mounting VyOS Config...done. 47.496305 vyos-router [1334]: Starting VyOS router: migrate configure. 47.559938 vyos-config [1348]: Configuration success
- When logging in (IPv6 does still work), the command outputs posted above differ a bit:
admin@vyos:~$ ping 1.1.11 /bin/ping: connect: Network is unreachable admin@vyos:~$ ip r 91.xxx.219.16/28 dev eth0 proto kernel scope link src 91.xxx.219.24 admin@vyos:~$ show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure S 0.0.0.0/0 [210/0] via 91.xx.219.17, eth0, weight 1, 00:11:09 K>* 0.0.0.0/0 [0/210] via 91.xxx.219.17, eth0, 00:11:18 C>* 91.xxx.219.16/28 is directly connected, eth0, 00:11:09
First thing to notice here is the difference between ip r and show ip route. While ip r now shows no default route whatsoever, show ip route suddenly has two default routes (one static, one kernel route). Connectivity however is completely lost:
admin@vyos:~$ ping 1.1.1.1 /bin/ping: connect: Network is unreachable
The configuration still looks the same and the eth0 interface still has its IPv4 address correctly obtained via DHCP:
[edit] admin@vyos# show interfaces ethernet eth0 { address dhcp hw-id 52:54:00:47:61:2b mtu 1500 offload { gro gso sg tso } } loopback lo { } [edit] admin@vyos# run show interfaces Codes: S - State, L - Link, u - Up, D - Down, A - Admin Down Interface IP Address MAC VRF MTU S/L Description ----------- ---------------------------------- ----------------- ------- ----- ----- ------------- eth0 91.xxx.219.24/28 52:54:00:47:61:2b default 1500 u/u lo 127.0.0.1/8 00:00:00:00:00:00 default 65536 u/u ::1/128
To fix this, I can manually add the route using ip r:
admin@vyos:~$ sudo ip r add 0.0.0.0/0 via 91.xxx.219.17 admin@vyos:~$ ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=60 time=2.59 ms ^C --- 1.1.1.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 2.589/2.589/2.589/0.000 ms admin@vyos:~$ ip r default via 91.xxx.219.17 dev eth0 91.xxx.219.16/28 dev eth0 proto kernel scope link src 91.xxx.219.24
To make things worse, everything works again, but show ip route now has a whopping three default IPv4 routes:
admin@vyos:~$ show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, > - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure K>* 0.0.0.0/0 [0/0] via 91.xxx.219.17, eth0, 00:00:49 S 0.0.0.0/0 [210/0] via 91.xxx.219.17, eth0, weight 1, 00:16:58 K * 0.0.0.0/0 [0/210] via 91.xxx.219.17, eth0, 00:17:07 C>* 91.xxx.219.16/28 is directly connected, eth0, 00:16:58
Configs
The following cloud-init meta-data is being used:
#cloud-config local-hostname: vyos hostname: vyos
The following cloud-init user-data is being used:
#cloud-config hostname: vyos fqdn: vyos users: - name: admin passwd: '...' vyos_config_commands: - set system host-name 'vyos' - set service ntp server 1.pool.ntp.org - set service ntp server 2.pool.ntp.org - set interfaces ethernet 'eth0' description 'WAN' - set interfaces ethernet 'eth0' ipv6 address autoconf - set interfaces ethernet 'eth0' ipv6 address eui64 '...' - commit - save
Versions
admin@vyos:~$ show version Version: VyOS 1.4-rolling-202402121109 Release train: sagitta Built on: Mon 12 Feb 2024 11:09 UTC Build UUID: feb4342e-fb0d-45ab-a0ce-aa04b8ec25b9 Build commit ID: 97481cb8ba4011 Architecture: x86_64 Boot via: installed image System type: KVM guest Hardware vendor: QEMU Hardware model: Standard PC (i440FX + PIIX, 1996) Hardware S/N: ds=nocloud-net;s=http://my-cloud-init-service/cloud-init/ Hardware UUID: ab4108e6-a657-402f-8f3b-2287829b3440 Copyright: VyOS maintainers and contributors
(The exactly same is happening with the latest rolling VyOS 1.5 so I omit the redundant information here)
Logs
I'm attaching the journalctl log of the boot process for the first reboot (the boot after the default GW doesn't work anymore), the cloud-init.log as well as boot.config.
{F4190659}
{F4190658}
{F4190657}
If you need any further information, please let me know.