Page MenuHomeVyOS Platform

DHCPv6-PD breaks interface config if it refers to VLAN interfaces
Closed, ResolvedPublic

Description

If the dhcpv6-options prefix-delegation configuration for an interface delegates prefixes to an interface that doesn't exist, yet, then configuration of the interface breaks during boot, and the entire interface config is lost from the running config.

Take the following example configuration:

interfaces {
    ethernet eth0 {
        address 1.2.3.4/29
        address 5.6.7.8/29
        address dhcpv6
        address 10.1.11.3/24
        description "WAN"
        dhcpv6-options {
            prefix-delegation {
                interface eth1 {
                    address 1
                    sla-id 1
                    sla-len 4
                }
                interface eth2.4 {
                    address 1
                    sla-id 4
                    sla-len 4
                }
                interface eth2.6 {
                    address 1
                    sla-id 6
                    sla-len 4
                }
            }
        }
        ipv6 {
            address {
                autoconf
            }
        }
    }
    ethernet eth1 {
        address 192.168.1.1/24
        description DMZ
    }
    ethernet eth2 {
        vif 4 {
            address 192.168.4.1/24
            description "VLAN 4"
        }
        vif 6 {
            address 192.168.6.1/24
            description IOT
        }
    }
...

This all works fine when configured from the terminal, presumably because one configures the VLANs first and then goes back to set up prefix delegation on them.

But then save the config and reboot, and things are broken. Here are the symptoms:

eth0 will be only partially configured. It seems like address configuration is broken at the moment it handles the address dhcpv6 line, since the addresses before that line are correctly configured, but those after it are not:

$ sh int
Interface        IP Address                        S/L  Description
---------        ----------                        ---  -----------
eth0             1.2.3.4/29                    u/u  WAN
                 5.6.7.8/29
                 2603:3024:1234:5670::3/128
                 2603:3024:1234:5670:943:43ce:348a:bce4/64
                 2603:3024:1234:5670:208:a2ff:fe0a:abcd/64
eth1             192.168.1.1/24                    u/u  DMZ
                 2603:3024:1234:5671::1/64
                 2603:3024:1234:5672::1/64
eth2.4           192.168.20.1/24                   u/u  VLAN 4
                 2603:3024:1234:5674::1/64
eth2.6           192.168.60.1/24                   u/u  IOT
                 2603:3024:1234:5676::1/64

Interestingly, DHCPv6-PD succeeds.

But here's the worst part:

$ conf
# sh int eth eth0
Configuration under specified path is empty
[edit]
# 

Apparently, when parsing the saved config broke validating the interface list, it caused the entire interface's config to be discarded. In order to keep using the router, I have to re-type the entire eth0 config and the commit. And all is well again, until the next reboot.

Note that everything works as expected if the DHCPv6-PD is moved to a later interface, such as eth3, delegating back to VLANs on prior interfaces. The VLANs are configured already by the time the config processor gets there, in that case. (But that's not a viable solution for me anymore, because I'm now working on getting prefixes delegated by two providers, and I don't want to have to shift all my other interfaces around to do it.)

Details

Difficulty level
Unknown (require assessment)
Version
1.3-rolling-202007241919
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

gadams created this task.Wed, Jul 29, 9:28 AM
gadams created this object in space S1 VyOS Public.

I should add that this problem has existed for at least a couple months, right up until 1.3-rolling-202007241919. Rolling builds after that one appear to ignore the prefix-delegation configuration entirely (T2740), so they don't exhibit this problem.

gadams changed Version from - to 1.3-rolling-202007241919.Wed, Jul 29, 9:34 AM
c-po claimed this task.Thu, Jul 30, 5:06 PM
c-po added a comment.Thu, Jul 30, 9:36 PM

The last bug mentioned could be due to: https://phabricator.vyos.net/T2746

About the problem with "later defined" interfaces this is still an architectural problem with the code inherited from Vyatta.

pasik added a subscriber: pasik.Fri, Jul 31, 7:46 AM
gadams added a comment.Mon, Aug 3, 9:04 AM

It's actually a little worse than I'd initially realized; the interface that DHCPv6-PD is being requested on (the interface with the prefix-delegation stanza) has to be the very last interface, even if it doesn't refer to the ones after it. So, even if I don't delegate any addresses from eth2 to eth3, it still fails. I have to do the delegation from interface eth3 on a system with four ethernet interfaces. And preliminary testing suggests that it has to be eth5 on a system with 6, even if eth4 and eth5 aren't configured at all.

@c-po, which bug do you think might be related to T2746? It's not clear to me which problem might be related to link-local addresses, here. Do you mean T2740 might be related to that problem? That's possible; I haven't looked into that, yet.

c-po added a comment.Mon, Aug 3, 9:37 AM

Please provide a full configuration for replication

gadams added a comment.Mon, Aug 3, 9:40 AM

OK, I take that back. Even when the interface with prefix-delegation defined is dead last, it still has this error. The last IP address is not configured, although /run/dhcp6c/dhcp6c.ethN.conf is correctly created, and DHCPv6-PD works. But config parsing is broken, and the entire config node is missing when I query for it.

I also notice this error printed on the console:

vyos-router[953]: Waiting for NICs to settle down: settled down in 1sec..
vyos-router[953]: Started watchfrr.
vyos-router[953]: Mounting VyOS Config...done.
vyos-router[953]: Starting VyOS router: migrate rl-system firewall configure failed!
vyos-config[4228]: Configuration error

And, looking in the logs, I see errors like this:

Aug 03 01:26:00 core-rt.avernus.com dhcp6c[2965]: cfdebug_print: <3>end of sentence [;] (1)
Aug 03 01:26:00 core-rt.avernus.com systemd[1]: Failed to start WIDE DHCPv6 client on eth3.
Aug 03 01:26:00 core-rt.avernus.com dhcp6c[2965]: cfdebug_print: <3>end of closure [}] (1)
Aug 03 01:26:00 core-rt.avernus.com dhcp6c[2965]: cfdebug_print: <3>end of sentence [;] (1)
Aug 03 01:26:00 core-rt.avernus.com dhcp6c[2965]: add_pd_pif: /run/dhcp6c/dhcp6c.eth3.conf:30 invalid interface (eth1.4): No such device
Aug 03 01:26:00 core-rt.avernus.com dhcp6c[2965]: clear_poolconf: called
Aug 03 01:26:00 core-rt.avernus.com dhcp6c[2965]: main: failed to parse configuration file
Aug 03 01:26:00 core-rt.avernus.com sudo[2962]: pam_unix(sudo:session): session closed for user root
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: Report Time:      2020-08-03 01:26:01
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: Image Version:    VyOS 1.3-rolling-202007241919
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: Release Train:    equuleus
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: Built by:         autobuild@vyos.net
...
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: Traceback (most recent call last):
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:   File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 316, in <module>
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:     apply(c)
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:   File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 296, in apply
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:     e.add_addr(addr)
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:   File "/usr/lib/python3/dist-packages/vyos/ifconfig/interface.py", line 684, in add_addr
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:     self.dhcp.v6.set()
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:   File "/usr/lib/python3/dist-packages/vyos/ifconfig/dhcp.py", line 113, in set
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:     return self._cmd('systemctl restart dhcp6c@{ifname}.service'.format(**self.options))
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:   File "/usr/lib/python3/dist-packages/vyos/ifconfig/control.py", line 51, in _cmd
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:     return cmd(command, self.debug)
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:   File "/usr/lib/python3/dist-packages/vyos/util.py", line 179, in cmd
Aug 03 01:26:01 core-rt.avernus.com python3[2921]:     raise OSError(code, feedback)
Aug 03 01:26:01 core-rt.avernus.com PermissionError[2921]: [Errno 1] failed to run command: systemctl restart dhcp6c@eth3.service
Aug 03 01:26:01 core-rt.avernus.com returned[2921]:
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: exit code: 1
Aug 03 01:26:01 core-rt.avernus.com noteworthy[2921]:
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: cmd '/sbin/ethtool -K eth3 ufo off'
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: returned (out):
Aug 03 01:26:01 core-rt.avernus.com python3[2921]: returned (err):

So, it seems that vifs are created after DHCPv6 is configured and started, and that causes dhcp6c to return with an error, which causes config parsing to fail. That, in turn, appears to cause the config system to drop the entire interface config for the containing interface, which is pretty bad.

What's the right solution, here? If not all the interfaces are configured yet when we set up dhcp6c's config files, and if the architecture inherited from Vyatta means that there isn't right now a feasible way to delay configuring this interface until all the other interfaces are set up, then would it make sense to delay starting dhcp6c, or start it in the background, rather than synchronously start it and wait for the return value of the process that forks the daemon process?

Perhaps it isn't necessary for the config system to use dhcp6c's return code to determine whether its config is correct, since we've validated the config as the user created it. Or maybe that doesn't make sense given the way that systemctl restarts the dhcp6c daemon. Perhaps it will always return an error code if the daemon fails to start up, and we should handle that.

gadams added a comment.Mon, Aug 3, 9:54 AM

This configuration replicates the error:

interfaces {
    ethernet eth0 {
        address 192.168.1.1/24
        duplex auto
        smp-affinity auto
        speed auto
        vif 4 {
            address 192.168.4.1/24
            ipv6 {
                dup-addr-detect-transmits 1
            }
        }
    }
    ethernet eth1 {
        address 192.168.2.1/24
        address dhcpv6
        address 10.1.11.3/24
        dhcpv6-options {
            prefix-delegation {
                interface eth0 {
                    address 1
                    sla-id 1
                    sla-len 4
                }
                interface eth0.4 {
                    address 1
                    sla-id 4
                    sla-len 4
                }
            }
        }
        ipv6 {
            address {
                autoconf
            }
        }
    }
    loopback lo {
    }
}
service {
    router-advert {
        interface eth0 {
            interval {
                max 200
            }
            prefix ::/64 {
            }
        }
        interface eth0.4 {
            hop-limit 64
            interval {
                max 200
            }
            prefix ::/64 {
            }
        }
    }
    ssh {
        disable-password-authentication
        port 22
    }
}
system {
    config-management {
        commit-revisions 20
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name core-rt.avernus.com
    login {
        user vyos {
            authentication {
                plaintext-password "whatever"
            }
        }
    }
    syslog {
        global {
            facility all {
                level notice
            }
            facility protocols {
                level debug
            }
        }
    }
    time-zone US/Pacific
}


// Warning: Do not remove the following line.
// vyos-config-version: "broadcast-relay@1:cluster@1:config-management@1:conntrack@1:conntrack-sync@1:dhcp-relay@2:dhcp-server@5:dhcpv6-server@1:dns-forwarding@3:firewall@5:https@2:interfaces@11:ipoe-server@1:ipsec@5:l2tp@3:lldp@1:mdns@1:nat@5:ntp@1:pppoe-server@3:pptp@2:qos@1:quagga@6:salt@1:snmp@2:ssh@2:sstp@2:system@18:vrrp@2:vyos-accel-ppp@2:wanloadbalance@3:webgui@1:webproxy@2:zone-policy@1"
// Release version: 1.3-rolling-202007241919

The router doesn't even need to be connected to anything.

Thanks for looking!

c-po added a comment.Mon, Aug 3, 11:41 AM

I actually can not reproduce this issue after fixing T2740. Please retry with the latest rolling which was just uploaded: vyos-1.3-rolling-202008031114-amd64.iso

c-po changed the task status from Open to On hold.Mon, Aug 3, 11:41 AM
c-po triaged this task as Normal priority.
gadams added a comment.Mon, Aug 3, 6:15 PM

That sounds hopeful! I will try it in a few hours and report back.

Unfortunately, the problem does not appear to be fixed in the latest rolling build, vyos-1.3-rolling-202008031923-amd64.iso.

Here's how I reliably reproduce this:

  1. On a physical (hardware) router
  2. Apply the config I pasted above (I just copied it to /config/config.boot; is there a better way? copy file <path> to running::/config/ doesn't seem reliable.)
  3. Reboot
  4. Notice Configuration error console message at the end of boot
  5. Log in
  6. show int
  7. Notice that address 10.1.11.3/24 is missing from eth1
  8. enter config mode: conf
  9. sh int eth eth1

Expected response: The full config of eth1 as set up in the config.
Actual response:

vyos@core-rt# sh int eth eth1
Configuration under specified path is empty
[edit]
vyos@core-rt#

I then tried updating to the new rolling image:

  1. add system image https://downloads.vyos.io/rolling/current/amd64/vyos-1.3-rolling-202008031923-amd64.iso
  2. reboot
  3. Notice Configuration error console message at the end of boot
  4. Log in
  5. show int
  6. Notice that address 10.1.11.3/24 is missing from eth1
  7. enter config mode: conf
  8. sh int eth eth1
  9. Get response Configuration under specified path is empty

So, no change in the latest build.

gadams changed the task status from On hold to Open.Tue, Aug 4, 12:14 AM
c-po added a comment.Tue, Aug 4, 4:55 AM

Let me spawn a fresh router and try again. The meatime, then you boot into thcomfiguration Error, please kog in, enter configure mode and type: load followed by commit and show the results. Thanks

gadams renamed this task from DHCPv6-PD breaks interface config if it refers to later-defined interfaces. to DHCPv6-PD breaks interface config if it refers to VLAN interfaces.Tue, Aug 4, 5:23 AM
gadams added a comment.Tue, Aug 4, 5:38 AM

Entering configure mode and then typing load and then commit brings everything up to what the config in config.boot specifies, and the running configuration shows the correct contents for eth1. It brings the router up to where it should have been at boot.

I've verified this on a test router and my main home router (both bare metal).

(And this is a much better workaround than re-keying the entire interface config, which id what I've been doing.)

c-po added a comment.Tue, Aug 4, 5:55 AM

I can reproduce it with your supplied config on a fresh router - only on boot time. Will check it out. Thanks for the config.

c-po added a comment.Tue, Aug 4, 6:44 AM

I just started a new ISO build - should be done in 40 minutes!

Any ISO newer then vyos-1.3-rolling-202008040117-amd64.iso will have the fix.

gadams added a comment.Tue, Aug 4, 7:06 AM

Awesome! That's really quick turnaround! I'll give it a try when the newer build appears.

c-po added a comment.Tue, Aug 4, 7:10 AM

Completed!

gadams closed this task as Resolved.Tue, Aug 4, 7:52 AM

I am very happy to report that the issue id resolved. The router now boots up fully without intervention once again.

Thanks!

c-po added a comment.Tue, Aug 4, 7:57 AM

Welcome! Thanks for beeing an early adopter / tester.