Page MenuHomeVyOS Platform

VyOS lacks DHCPv6-PD (Prefix delegation) length / IA_PD support
Closed, ResolvedPublicBUG

Description

Original bug filed by compaq963 on 2013-12-31 in Bugzilla, copied here so that it doesn't get lost/forgotten (will also copy in comments):

It would be great to see Prefix delegation support in VyOS. Prefix delegation is needed for IPv6 with many ISPs because this is how they hand out prefixes.

Also, It would be great to allow you set a custom IA_PD prefix. An example is my ISP will hand out a /60 to you with a custom IA_PD prefix. If you just accpet the default you get a /64.

One solution is using WIDE-DHCPv6. The project page is at http://wide-dhcpv6.sourceforge.net/

Details

Difficulty level
Hard (possibly days)
Version
1.1.x
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Revisions and Commits

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Prefix delegation is a very important function, which needs to be used to provide routable prefix segment allocation to users.

I tried out DHCPv6-PD but so far no luck (but for a separate reason).

I filed a separate report in T2510.

Once I get that figured out I'll start testing DHCPv6-PD.

This comment was removed by danielpo.
jack9603301 added a comment.EditedMon, May 25, 4:50 PM
if eth['dhcpv6_pd']:
    e.dhcp.v6.options['dhcpv6_pd'] = e['dhcpv6_pd']

I have a look at the submission record of @c-po before. Is it wrong here?

The address for submitting the records is as follows:
rVYOSONEXba06bfed4fc84b699689932eeb3af9b9be0f5cd7

According to the reading code, the normal way of writing seems to be:

if eth['dhcpv6_pd']:
    e.dhcp.v6.options['dhcpv6_pd'] = eth['dhcpv6_pd']

@c-po In addition, I also feedback several questions (if this is not a problem, please tell me the correct understanding and use):

a. Each time the routing device is restarted, the DHCPv6 PD delegation operation cannot be performed automatically. The following instructions need to be executed:

disconnect interface pppoe0
connect interface pppoe0

b. If DHCPv6 PD is used to get prefixes for different bridges, its interface ID cannot use different values, which may cause problems. If the specification allows different interface ID values, then this may be a bug. If the specification does not allow different interface ID values, the location of this parameter on the command line can be modified.

c-po added a comment.Mon, May 25, 6:02 PM

a) should not be the case

b) I do not understand the question

jack9603301 added a comment.EditedMon, May 25, 6:08 PM

a) I know it's not supposed to be like this, but it does show up with me.

b)I tried to execute the following command, and there were some problems.

set interfaces pppoe pppoe0 dhcpv6-option delegate br1 interface-id 1
set interfaces pppoe pppoe0 dhcpv6-option delegate br2 interface-id 101

In addition, please pay attention to the previous question of another user feedback. I have read the relevant code you submitted, and you may have written it wrong.

Hello! I appologize tor being absent for so long, now. But I'm back, and ready to contribute again. It looks like I have a lot to catch up on.

The work that I did over a year ago has been working very stably for me for the past year, with one bug related to router restarting, that I have planned to address, but haven't had time for until now. It looks like some folks have taken that and merged it into later builds, which is not something I'd gotten around to, myself. Is the latest work based on this, or all new?

How can I help at this point?

What does it mean? I'm sorry, I may not understand.

Oh, wait, is this done, and completely working, now? I'm still trying to catch up on this long thread. If it's all working, now, I'll just have to try the latest build and see if it works.

Was any of the work I did last year used? I suspect not, which would be too bad, because I'd put a lot of time into it, and I'd hoped to contribute to this effort in VyOS. But then, that would be my own fault for disappearing for a year.

This work is done by @c-po, but according to my experiment, the current scrolling version is a 20200523 image, and dhcpv6-pd works basically fine in my ISP environment. But there may be some minor problems. Please read the comments on this list.

gadams added a comment.EditedMon, May 25, 9:36 PM

I'm trying to catch up on the comments. There are a lot.

I have just tried this with 1.3-rolling-202005250117, but it doesn't seem to work. The config syntax is somewhat similar to the one I used (see T421#27094), with the difference that the sla length is specified, rather than the total prefix length. As a bit of feedback, that seems to require more detailed knowledge about the delegation from the ISP, while the code in VyOS could easily compute that from the desired result prefix length. (And then, the default can be 64, which is likely to be correct in more situations than a default sla-len f 16.) There is also the issue that some upstream routers will delegate a different prefix length than is requested in its UI, making it harder for the VyOS user to get this number right.

But no matter.

I have tried this configuration, specifying everything, just to be sure:

interfaces {
     ethernet eth3 {
+        dhcpv6-options {
+            delegate br2 {
+                interface-id 1
+                sla-id 1
+                sla-len 8
+            }
+            delegate eth2 {
+                interface-id 1
+                sla-id 2
+                sla-len 8
+            }
+            delegate eth2.4 {
+                interface-id 1
+                sla-id 4
+                sla-len 8
+            }
+            delegate eth2.6 {
+                interface-id 1
+                sla-len 8
+            }
+        }...
    }
}

But I consistently get this error:

Report Time:      2020-05-25 14:30:51
Image Version:    VyOS 1.3-rolling-202005250117
Release Train:    equuleus

Built by:         autobuild@vyos.net
Built on:         Mon 25 May 2020 01:17 UTC
Build UUID:       897d21e1-a9ad-4dc1-b2ad-61213d16c422
Build Commit ID:  a29347ca9dd260

Architecture:     x86_64
Boot via:         installed image
System type:       guest

Hardware vendor:  ADI Engineering
Hardware model:   RCC-VE
Hardware S/N:     1125160053
Hardware UUID:    Unknown

Traceback (most recent call last):
  File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 303, in <module>
    apply(c)
  File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 205, in apply
    e.dhcp.v6.options['dhcpv6_pd'] = e['dhcpv6_pd']
TypeError: 'EthernetIf' object is not subscriptable



[[interfaces ethernet eth3]] failed
Commit failed

Any hints on how to dive into debugging this problem? I'm starting with looking into /usr/libexec/vyos/conf_mode/interfaces-ethernet.py. A bit has changed since I last looked at the config code.

Finally, is this the best task for tracking debugging the remaining isseus in DHCPv6-PD, now that it's closed? Or should I open another (or is another already open)?

So for now, back to my working but old DHCPv6-PD VyOS build. ;)

jack9603301 added a comment.EditedMon, May 25, 9:42 PM

@gadams With regard to this question, I have specifically read the submission record of @c-po, and I think I have found out where the possible errors are (please read my comments above). Of course, I expect @c-po to solve the errors introduced by itself, because it seems very simple, if the problem @c-po is not resolved for a long time, I may consider submitting a pr request.

Of course, for the time being, I can only solve the specific code errors introduced by @c-po. Above, I have raised some of the existing minor problems of using delegation in pppoe. I think I may not know how to solve this problem for the time being. If you have this ability, you can consider solving it.

jack9603301 added a comment.EditedMon, May 25, 9:49 PM

Usually this bug should reopen a bug report. The project management system can create subtasks, and you can usually consider opening a subtask

Ah! Yes, I see what you're referring to. I'll take a look at that in a few hours.

Hmm. So, correcting that error did allow the configuration to apply and save successfully. However, no prefix delegation request appears to be sent and no configuration of delegated interfaces appears to be done. Indeed, no dhcp client appears to be running.

More digging is definitely required.

It's also possible that some other work is required; for instance, T1055 still needs to be merged, and T1059 may be involved (or that may be no longer relevant). I needed those fixes in order to enable DHCPv6-PD from an ethernet interface.

@c-po, how do you recommend I proceed? In the meantime, I'll look at the relevant commits.

However, no prefix delegation request appears to be sent and no configuration of delegated interfaces appears to be done. Indeed, no dhcp client appears to be running.

What does it mean?

Essentially, that means that nothing happened. What you want prefix delegation to do on the client side (which is what we're talking about) is request a delegated prefix from the upstream router via DHCPv6, and then divide the delegated prefix into site-level aggregations and assign the resulting network numbers (delegated prefix + sla id) to other interfaces. None of that happened.

But since I don't see any evidence that a DHCP client was started at all, it makes sense that none of the rest of it happened. There must be something that's supposed to turn on the DHCP client that I haven't found, yet.

jack9603301 added a comment.EditedTue, May 26, 4:03 AM

@gadams I am ready to fix the following problems. If @c-po doesn't understand what happened and solve it, I may submit PR request.

Traceback (most recent call last):

File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 303, in <module>
  apply(c)
File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 205, in apply
  e.dhcp.v6.options['dhcpv6_pd'] = e['dhcpv6_pd']

TypeError: 'EthernetIf' object is not subscriptable

jack9603301 added a comment.EditedTue, May 26, 4:07 AM

@gadams Strange, in my environment, delegation requests are successful when PPPoE is used for dialing. When setting delegation on the non PPPoE interface, do you need the superior router to set DHCPv6 PD delegation and issue operations on the interface (this is not required here, I mainly obtain prefix delegation when ISP dials PPPoE)?

That great, and necessary. But then, once that's done (I've done it in my copy), then I seem to be missing the next step: It's not starting a DHCPv6 client. What is supposed to do this?

I should note that in setups for some very large ISPs in the US, and probably elsewhere, there is no need for a DHCPv6 client at all, except for doing DHCPv6-PD. The directly-connected IPv6 network gets its assignment via RA, and IPv4 addresses may be assigned statically.

Perhaps there's a built-in assumption that the dhclient will be started up for other reasons, but that's not necessarily the case.

jack9603301 added a comment.EditedTue, May 26, 4:19 AM

I should note that in setups for some very large ISPs in the US, and probably elsewhere, there is no need for a DHCPv6 client at all, except for doing DHCPv6-PD. The directly-connected IPv6 network gets its assignment via RA, and IPv4 addresses may be assigned statically.

@gadams I'm sorry, to be honest, I don't understand this sentence. Maybe I don't understand the network in the United States. In China, ISPs use PPPoE for the last kilometer of user authentication. DHCPv6 PD realized through @c-po can work normally in my ISP environment, but this may not be enough. As a perfect router system, all possible and reasonable ways of use need to be considered.

Come on, I think I've got a patch for py error reporting ready before @c-po. I'll launch PR directly.

https://github.com/vyos/vyos-1x/pull/430

jack9603301 added a commit: Restricted Diffusion Commit.Tue, May 26, 5:58 AM
jack9603301 added a commit: Restricted Diffusion Commit.

I'm sorry, to be honest, I don't understand this sentence. Maybe I don't understand the network in the United States. In China, ISPs use PPPoE for the last kilometer of user authentication.

PPPoE is used in many places in the US, but in my experience, it's not so common when the connection is delivered via cablemodem, radio, fiber, or a number of other residential and small business connections. Is most of those cases, it will be converted by some CPE to regular old ethernet, with simply a next-hop router configured in the customer router. In many of these cases, the CPE is integral to the physical layer transport, and can't be replaced with a small router. Larger router frames may have suitable line cards for these media, but then they're not running VyOS; they're running IOS or Junos or the like.

So, for these cases, The DHCPv6-PD client will be running on an ethernet interface, and may be the only use of dhclient, if the IPv4 address is not being assigned via DHCP.

jack9603301 added a comment.EditedTue, May 26, 6:58 AM

@gadams Therefore, in order to use SLAAC and DHCPv6 PD in this case, the router below should receive RA from the superior router or CPE device?

jack9603301 added a comment.EditedTue, May 26, 7:38 AM

it will be converted by some CPE to regular old ethernet,

Ask a digression, Why the old Ethernet?

I'll be building an image and testing this out today. Thanks Jack!

I have fixed the following error and submitted PR request for it, but it has not been passed yet.

Traceback (most recent call last):
File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 303, in <module>
  apply(c)
File "/usr/libexec/vyos/conf_mode/interfaces-ethernet.py", line 205, in apply
  e.dhcp.v6.options['dhcpv6_pd'] = e['dhcpv6_pd']
TypeError: 'EthernetIf' object is not subscriptable

However, there will still be other problems, such as those described in the comments above, and two small problems I have raised. For the moment, I can only guarantee that there will be no py syntax interpretation errors.

c-po added a comment.Tue, May 26, 3:15 PM

The ethernet TypeError is fixed in the upcoming rolling release

c-po added a comment.Tue, May 26, 3:20 PM

@jack9603301 please note that this is currently a beta implementation which of course contains bugs. Also the CLI will change in the near future to support requesting specific prefix sizes (T2506)

As I'm only a spare-time contributor and do this free of charge there can be of course errors. I feel a bit offended by your demanding comments.

Thank you @c-po. It's very good. I fully understand you. Like you, most of the open-source contributors are amateurs. I don't mean anything else. I think you may have misunderstood my remarks. Please don't be too sensitive. Anyway, thank you. In addition, you can take a look at the comments above and the two questions I raised. I don't know how to solve it for the time being. If you or someone else has a good solution, thank you.
Thanks again

One other question -- looks like SLAAC is working with the router-advert, but I believe Windows clients need DHCPv6 to receive addresses. I believe that the auto-config of DHCPv6 based on assigned prefix is not included yet, correct? Is that the 'Assisted' or 'Managed' modes that pfSense/Opnsense has?

c-po added a comment.Tue, May 26, 9:13 PM

Windows 10 works with SLAAC like a charm.

Hey, @c-po, thanks for getting this moving again. As you may recall, I did a lot of development work on this some time ago that I never pushed to get merged. (I dropped the ball.) Unfortunately, you've had to re-do a lot of what I did, I'm sure. I'm happy to incorporate any of that past work or do some fresh development to polish this feature up.

So, I'm curious where things stand, in your estimation? What work needs to be done?

I suspect there are a few areas, which you may or may not already have addressed:

  • Starting up dhclient (assuming that's what's still being used) when DHCPv6-PD is needed, but it's not otherwise already started.
  • Distributing delegated addresses to other interfaces (I'd thought you already did this, but it sounds from some of the comments above that perhaps it's not done?)
  • Ability to request specific prefix sizes
    • Does this refer to requesting from the DHCPv6 server, or:
    • when specifying the total prefix length of the delegated prefix + the sla id?
  • Anything else?
c-po added a comment.EditedTue, May 26, 9:52 PM

Hi @gadams,

dhclient is not used, wide-dhcp client is started on demand. Also prefixes are properly assigned to interfaces, using this at home for pppoe. Specific prefix size request is implemented as of T2506.

This means you request a specific size from you ISP, sla-id just identifies a prefix (/64) from this assignment. You can refer to PPPoE documentation for more info https://docs.vyos.io/en/latest/interfaces/pppoe.html#ipv6

Currently I see no open topics - or I am not aware of any.

gadams added a comment.EditedTue, May 26, 10:21 PM

Thanks for the response.

In my case, I don't see wide-dhcp started. What piece of code is supposed to do this? I'd like to dig into this. Without this, of course, nothing else happens.

I'm glad to hear that the rest is implemented!

A couple suggestions:

  • The docs say that the default sla-len is 16. It would seem more useful to default to whatever size would yield a total prefix length of 64. Also, it seems better to specify the final prefix length, rather than the length of the sla portion, for the reasons I mention in T421#65109.
  • The docs should be moved out of the pppoe section. It seems strange to have DHCPv6-PD docs there, when the two are really completely independent. At first, when I saw it there, I thought it would only work in the context of pppoe, which thankfully is not the case.

Cheers!

I will track and test whether the dhcpv6-pd is functioning properly to see if the previous problems still exist in my environment.

jack9603301 added a comment.EditedWed, May 27, 3:03 AM

@c-po @gadams In 20200523, I found several bug, although it will not affect my use too much (anyway, it can be restarted). For example, when pppoe is rebooted many times, the prefix will be reassigned, but the old prefix that has been assigned has not been deleted. Although I can keep it working, I still don't know how to solve it. I wonder whether restarting its interface can solve the problem when the current prefix allocation request is restarted.

This may and has actually caused problems with ipv6 routing.

(edit:I misunderstood. It seems that sometimes when PPPoE is restarted, or under certain circumstances, the address or route obtained by its delegation will be lost (or not available))

Because I usually choose to restart the router + restart PPPoE (another problem is that if I do not restart PPPoE after restarting the router, it will not be delegated), so I ignore this problem. It's a strange problem. I don't know how to solve it for the moment.

c-po added a comment.EditedWed, May 27, 5:10 AM

@gadams, please describe use case where wide does not start and include config with expected result and VyOS version. Sure config can be adjusted, luckily its an open Git repo so just send a PR.

gadams added a comment.EditedWed, May 27, 5:47 AM

I included all the bits I thought were relevant in T421#65109 (you'll need to click "Show older changes" at this point to see it).

I'd expect a DHCPv6 client to be started, an address delegation to be requested by it from the DHCP server upstream, and then subnet assigned to the relevant other interfaces according to the specified sla-ids. But no dhcp client of any sort is running after applying that config (and fixing the typo in interfaces-ethernet.py), so naturally none of the rest of it happens.

I'd really appreciate your thoughts about it. Thanks, @c-po.

One thought I have is that my local fix to interfaces-ethernet.py might be happening too late, or something. I should just do a fresh build, or wait for the fix to appear in a daily build.

@c-po I didn't find out where to call using wide DHCP. I intend to analyze the reason why restart can't automatically start allocating prefix in the 20200523 image.

jack9603301 added a comment.EditedWed, May 27, 11:36 AM

@c-po Trace the problem that the system cannot automatically perform prefix delegation after restarting:

After the restart, call the following command to get the prefix assignment:

sudo systemctl start dhcp6c@pppoe0

Strangely, after rebooting the system, I specifically dhcp6c@pppoe0 Query and find that this service is manually restarted by using the vyos command to restart PPPoE 0 or by using systemctl dhcp6c@pppoe0 Before, this service was in failure status, and your restart field of the service didn't work. I can't make it start normally during the configuration loading process after the system is restarted 。

c-po added a comment.Wed, May 27, 4:00 PM

@jack9603301 your assumptions are invalid. I have a fully reboot-save PPPoE setup. Please stop making wrong assumptions! and search the code properly!

https://github.com/vyos/vyos-1x/blob/current/data/templates/pppoe/ipv6-up.script.tmpl#L47

c-po added a comment.Wed, May 27, 4:01 PM

@gadams your mentioned problem is already fixed in the latest rolling image

@c-po Strangely, after rebooting the system, I specifically dhcp6c@pppoe0 Query and find that this service is manually restarted by using the vyos command to restart PPPoE 0 or by using systemctl dhcp6c@pppoe0 Before, this service was in failure status, and your Restart field of the service didn't work. I can't make it start normally during the configuration loading process after the system is restarted 。

@c-po It has to be said that this is really a troublesome problem. I'm still stuck in fault exploration, rather than making patches. Therefore, there may be some assumptions and verification processes, but it's really troublesome. When the system is restarted, the automatic operation of the service will fail, and I have to restart the service manually.

I'm sorry to say that with the current rolling release, still no dhcp6c is started when I add the PD configuration. Here is my test config:

interfaces {
    ethernet eth0 {
        address dhcp
        dhcpv6-options {
            prefix-delegation {
                interface eth0.3 {
                    address 1
                    sla-id 3
                }
            }
        }
        duplex auto
        ipv6 {
            address {
                autoconf
            }
        }
        smp-affinity auto
        speed auto
        vif 3 {
            description TEST-VLAN
            ipv6 {
                address {
                    autoconf
                }
                dup-addr-detect-transmits 1
            }
        }
    }
    loopback lo {
    }
}

And no client is started:

vyos@vyos:~$ sudo ps -afe | grep dhc
root      1170     1  0 05:30 ?        00:00:00 /sbin/dhclient -4 -nw -cf /var/lib/dhcp/dhclient_eth0.conf -pf /var/lib/dhcp/dhclient_eth0.pid -lf /var/lib/dhcp/dhclient_eth0.leases eth0

I notice that you pointed out the code that starts dhcp6c in the pppoe templaet: https://github.com/vyos/vyos-1x/blob/current/data/templates/pppoe/ipv6-up.script.tmpl#L47. Does this indicate that there are still some pppoe-specific assumptions in the code, or is there a comparable line for other interfaces?

I'll go looking...

@gadams In my environment, oddly enough, after a reboot dhcp6c@pppoe0 Restart PPPoE 0 by using the vyos command or use systemctl to query and find that the service is manually restarted dhcp6c@pppoe0 Before that, the service was in a failed state and the Restart field of the service did not work. During the configuration loading process after system restart, I cannot make it start normally

Aha! Thanks, @c-po. As I suspected, there is an assumption in ifconfig/interface.py that the DHCPv6 client need not be started if we're not getting our address via DHCPv6, which of course is not the case, here.

Working on a fix...

jjakob added a comment.EditedThu, May 28, 8:02 AM

Maybe we should add new methods to the Interface or DHCP class to allow starting just DHCPv6-PD without assigning an address to it? The way it's done now is by assigning an address with the value "dhcpv6" to the interface through the add/del_addr methods of Interface class. There needs to be a separate method for DHCPv6-PD without addressing (and generate a dhcpc config that doesn't assign the address, of course).

Also was ISC dhclient replaced with Wide dhcp client for all cases or just dhcpv6?

Edit: in fact thinking about it, there's no harm in starting the daemon through add_/del_addr, just that it must be configured to not assign an address in that case.

tbr added a comment.Thu, May 28, 8:17 AM

Just my input: It seems there is some confusion about what DHCPv6-PD actually is. And this reflects in the endless discussions and questions here in the thread...

To me it is simple. The configuration should reflect the fact that PD is just one of the options of DHCPv6. So, when you want to receive a prefix for delegation, but don't want an address, then just set the no-address and no-temporary options (however they are called in the config). It's still DHCPv6 in the end. Now, in practice it doesn't really matter too much if the router gets an address or not (when using PD). But ok, you are right... an option to tell VyOS not to enable the address request would be useful. In fact, DHCPv6 RFC states that the client can make any (allowed) request and the server can ignore whatever it wants to ignore... so even when an ISP only hands out prefix, requesting an address should not break anything (the router will just never receive an address, but only a prefix).

Anyway :) Glad that at least the PD is now possible in VyOS.

jjakob added a comment.EditedThu, May 28, 8:37 AM

@tbr thanks for clarifying that, I agree. So the way to do that would be to set 'address dhcpv6' and 'dhcpv6-options parameters-only'. That is slightly confusing at first, as the combination of those 2 options shouldn't actually assign an address. I haven't tried it but that's how I expect it should work, I don't use PD currently. If it does work my comments regarding new methods in scripts are entirely unneeded.

I have sent a pull request: https://github.com/vyos/vyos-1x/pull/437

I added a new method to the Interface class to start the DHCPv6 client in these cases. I would like to have had that class be more generic, and check for itself whether the interface's IPv6 addresses are being configured via DHCP, and start itself only when not, but that information isn't available at the right time there. So I left it up to the caller to decide. But that means that there is probably another interface or two outh there that will need this treatment.

@tbr: The problem is actually somewhat the opposite, if I'm understanding you correctly (this late at night). It's that the code there to start up the DHCP client predates the addition of DHCPv6-PD, so it only starts up the DHCP client when it's configured to get its addresses via DHCPv6. That may even work if someone has configured their router to get its router-to-router address via DHCPv6 as well, but honestly that seems like a probably rare configuration. In any event, with my change, the dhcp6c daemon actually starts, now. I'll do more testing tomorrow.

danielpo removed a subscriber: danielpo.Thu, May 28, 8:48 AM

@gadams have you tried the above 2 settings: 'address dhcpv6' and 'dhcpv6-options parameters-only' without your patch to see if the client doesn't assign an address in that case?

gadams added a comment.EditedThu, May 28, 9:10 AM

@jjakob Yes, I tried dhcpv6-options parameters-only; it had no effect. I did not try 'address dhcpv6' simply because that doesn't seem like a great configuration. But it would have been worth testing, anyway. And while that would have been good to test, it would be a pretty awkward an unexpected workaround for regular users to think of it.

But in both cases, it probably wouldn't have worked, anyway, because there is currently a bug in the configuration parsing code in configdict.py that my pull request addresses.

I see that there are other, subtle issues. For instance, in the unusual case of delegating from a native (trunk) interface to a vlan interface nested within it, dhcp6c will fail to start, because it starts before the vif is created, so its config looks invalid. (I notices that because that's how I happened to set up my test router just now.) I can fix that with a quick pull request...

Actually, it looks like that change has been automatically included in my previous pull request.

Ah, I think I see what you, @tbr and @jjakob are getting at. If I want to do DHCPv6-PD, then I need to start the daemon. But if I don't want an address using the protocol, then I can explicitly turn it off.

But this does seem confusing to the user. In order to request a delegated prefix, I'd have to tell VyOS that I want to configure the interface address via DHCPv6 ('address dhcp'), but that I don't actually want that ('dhcpv6-options parameters-only') and also that I'd like to request a prefix. A bit awkward, even if that gets the desired outcome from the perspective of configuring the daemon.

However, the user isn't configuring the dhcp6c daemon; they're configuring VyOS, and VyOS is translating that configuration into the appropriate dhcp6c.{intf}.conf file and running state. From that perspective, there are (at least) three orthogonal things that the user might want via DHCPv6:

  • An address (or addresses)
  • A prefix delegation (or delegations)
  • Other parameters (DNS servers, etc.)

It seems best to let the user choose which of those they want, and let VyOS build the configuration that delivers it. To that end, if the user only wants a delegated prefix, they shouldn't have to know the magic of turning on getting an address but then disabling it again.

I'll think about this more overnight.

jjakob added a comment.EditedThu, May 28, 9:52 AM

@gadams I agree it's confusing, to change the syntax isn't hard, we just have to choose the best user-friendly syntax and behavior. It can be even accomplished without changing the syntax, by:
a) if 'dhcpv6-options delegate' is set, do the same as for 'parameters-only', plus start dhclient by add_addr('dhcpv6')
or
b) start dhclient if either 'dhcpv6-parameters' or 'address dhcpv6' is set but only assign an address in the 'address dhcpv6' case, may be the simplest option.

In addition to the address, we have to think about whether to set:

  • routes
  • search domain
  • nameservers (this is already a setting in a different place, not necessary here)
jack9603301 added a comment.EditedThu, May 28, 10:54 AM

@c-po

I try to manually modify the contents of /etc/ppp/ipv6-up.d/1000-vyos-pppoe-pppoe0 , and change the following commands:

systemctl start dhcp6c@pppoe0

Modify to the following command:

systemctl restart dhcp6c@pppoe0

After restarting the system, it seems that the required error reporting output has been attempted:

May 28 18:40:27 vyos dhcp6c[2222]: cfdebug_print: <3>end of sentence [;] (1)
May 28 18:40:27 vyos dhcp6c[2222]: add_pd_pif: /run/dhcp6c/dhcp6c.pppoe0.conf:20 invalid interface (br1): No such device
May 28 18:40:27 vyos dhcp6c[2222]: clear_poolconf: called
May 28 18:40:27 vyos dhcp6c[2222]: main: failed to parse configuration file

Of course, this is still the fault exploration of dhcp6c, rather than making its patches, reading and reporting errors. When dhcp6c is started to perform delegation, its interfaces br1 and br2 are not created. However, how to solve this problem? Can anyone come up with a good way?

jack9603301 added a comment.EditedThu, May 28, 2:05 PM

In general, there are several solutions:
a) Add the CLI option of auto repair daemons, and rely on cron to execute the repair program. In case of service failure, the service can be restarted automatically
b) Find the only way to solve the problem thoroughly
Generally, I prefer a + b, so that when the service fails to start in a single time, the daemons can complete the recovery execution.
But it's just an idea. If you have any other suggestions, please let me know.

In the current situation, using automatic recovery scripts for a specific service should avoid the problem of startup failure.

@jjakob Yes, exactly my thoughts, and what my last pull request starts. I'll try to catch the remaining cases later this evening my time (in 12 hours or so). I can imagine one case that might be a little tricky.

@c-po does it make more sense to open a new task for this? Also, would you mind looking at my pull request so far? I assume you have the power to accept it.

c-po added a comment.Thu, May 28, 4:32 PM

@gadams it makes no sense to use this as a catch-all thread. New requests/bugs should go into dedicated tasks.

Also from my perspective we could indeed somehow start dhcp6 if there is no interface address dhcpv6 defined combined by some voodoo in the background - I already have an idea.

I have not merged your PR as I had the same locally but not yet pushed - it is all integrated in the latest ISO now. enjoy!

Something else I realized last night: In general, it's not safe to start dhcp6c before all interfaces are configured, as long as PD is specified (whether 'address dhcpv6' is specified). That's because the prefix-delegation stanza can refer to any other interface on the system--even ones that haven't been set up, yet. That might include vif interfaces (such as I noticed last night) or any other virtual interface, like br or tun.

What we do now is start dhcp6c while configuring an interface that has 'address dhcpv6' or 'prefix-delegation' (with my pull request). This creates the dhcp6c.{intf}.conf file, including references to other interfaces, and then starts dhcp6c immediately. When dhcp6c sees the references to nonexistent interfaces, it fails, and that causes an exception which, on boot, even halts processing of config.boot, which leaves things in a pretty inconsistent-looking state. This is pretty bad.

Is there a way to defer starting dhcp6c until all interfaces are otherwise configured?

@jack9603301 I think what you're seeing is another case of exactly this, since you're delegating to br1, which doesn't exist, yet, when pppoe0 is initially configured.

OK, I have found the best recovery. I will submit PR immediately. I will modify the service settings of systemctl and use its failure to restart automatically to fix the problem. When dhcp6c service fails to start, it will restart according to the preset settings.

@gadams Yes, I thought that since the system CTL automatic restart failed, I might need to write a script to perform the automatic recovery. Now it doesn't seem necessary. I will modify its service file.

jack9603301 added a comment.EditedThu, May 28, 4:37 PM

The repair settings take effect on tests in the local environment.

Restart=on-failure

StartLimitIntervalSec=0

RestartSec=10

Please merge this fix.

Recovery from failures does seem generally desirable, but it would also be preferable to discover errors in configuration while in conf code. For this reason, it seems like the best way to handle this would be to defer starting dhcp6c until the very end of configuring all the interfaces, if that's possible. Is there a mechanism already to do this, or should I look into restructuring things slightly.

Also, should this be a new Task? I'll be happy to create one. I'm just not sure the established best practices, here.

I still think the failure recovery mechanism needs to exist, but I agree with you. I think we should postpone the startup mechanism of dhcp6c until all interfaces have been initialized. A better idea is to execute dhcp6c processing uniformly after all interfaces have been initialized.

I think this problem can consider setting up a new task list and studying how to postpone the processing of this. Fault recovery is usually desirable, but we should not push all possible priority-induced failures to recovery in the fault.

Because I need to recover the fault first, so I made a patch. After all, I don't have a good way to postpone its processing.

Sure, a new task would be very welcome so there's less spam in this task.
Why do you want to postpone dhcp6c startup? All the requirements and dependencies are there when the interface scripts start it. The interface is brought up before it's started. Other than waiting for a pppoe connection, yes, that would be worthwile. Each interface script has a priority so that other interfaces they depend on are configured before the one that depends on them, that's set in the priority tag in the XML definitions and done by vyatta-cfg. They're started sequentially by their priority value, not all at once.

Because it is impossible to determine the user's dependence on the configuration interface of dhcp6c, the dependency problem has already occurred, which will cause the startup program of dhcp6c to fail after rebooting the system. I have made a patch configuration so that dhcp6c, can be restarted indefinitely in the event of failure, but this is still not the best way. Of course, the best solution may be to postpone the startup priority of dhcp6c.

When using dhcp6c for prefix delegation fetch, it can rely on any interface and should be called after all interfaces have finished starting

Of course, if there is a better way to solve this problem, you are welcome to put forward

I haven't looked at how dhcp6c gets started currently. VyOS uses systemd to manage the services, but none of them should be set to enabled, they're all started manually via VyOS scripts. It's possible it's done differently in this case, I'm not going to speculate on something I don't know. I assumed it got started the same, when the interface script starts it.
On the dependency problem, I don't know how dhcp6c behaves when it's started with configured nonexistent interfaces. If it does cause a failure to start, that is an issue that needs to be fixed via another way. I'm not the implementer of this code so I'm not going to speculate on the best way to do it.

@jjakob I understand what you mean, so let me explain to you carefully. Over the past two days, I have been trying to find out why the first automatic startup of dhcp6c fails (this is usually due to the automatic call when vyos automatically loads the configuration). At present, the root cause has been determined. I just call dhcp6c manually by the script of vyos, and I also understand this process, but because of the particularity of prefix delegation, such as my case, When pppoe completes the call to execute dhcp6c@pppoe0 execution prefix delegation, I give the prefix to br1 and br2, respectively, which are not created at the startup of their pppoe0 and dhcp6c@pppoe0, so the dhcp6c launcher will fail. I currently have a pudding configuration that when the vyos script manually invokes and starts dhcp6c and fails, it restarts at regular intervals until the service runs or the user stops it manually. But this is far from enough, it only depends on the recovery from the failure of the systemctl startup service to complete the recovery operation. As @gadams said, due to the particularity of dhcp6c's configuration file, when its prefix is delegated, its dependency will change from configuration to change, and its behavior allows it to depend on all interfaces. Therefore, it is a good idea to postpone the startup of dhcp6c to the last call (please note that fault exploration has been completed, the root cause of this failure has been determined, and its temporary patch configuration has been tested in the local environment. Therefore, we can first approve the patch configuration merge, and then discuss the issues related to the postponement of dhcp6c)

I've been trying to find the ultimate solution to this problem, and it seems that the best solution is to postpone dhcp6c until the end of all interfaces to start it. Before that. I have done enough testing, and the previous comments in the error log have been released.

c-po added a comment.EditedThu, May 28, 9:40 PM
In T421#65476, @gadams wrote:

Recovery from failures does seem generally desirable, but it would also be preferable to discover errors in configuration while in conf code. For this reason, it seems like the best way to handle this would be to defer starting dhcp6c until the very end of configuring all the interfaces, if that's possible. Is there a mechanism already to do this, or should I look into restructuring things slightly.

Configuration errors can be discovered in config mode in the verify() sectiom of each Python script.

Defer starting until the end is not possible in the current design with the Vyatta backend. dhcpc6 is started individually on a per interface basis. I have no idea how the race against a non existing bond or vlan interface could be fixed currently.

Something like an async late callback would be nice where function calls could be registered.

@c-po if the interface dependency system that @jjakob describes works as I might imagine, then perhaps it's just a matter of adding the interfaces that appear in prefix-delegation configs to the dependency lists. (There would be some subtleties dealing with things like vifs within an interface, but that can be sorted out.)

So - just to refocus for a minute...

Originally, the reason DHCPv6-PD requests weren't implemented was that dhclient didn't support the -P flag - which has now been corrected - however it looks like the implementation has become something different now. Maybe on the wide client instead?

The problem is that the PD request can only be done when an interface is up - and is verified to be up. This seems to indicate that this action can only be performed as a post-hook type way.

A long time ago, I believe I posted in my code for using dhclient on pppoe interfaces - as a workaround for them not having a DUID - and the example hook I'm using for NetworkManager to handle this in my non-VyOS install, which is handled as a post hook executed AFTER the interface is up.

There will be race conditions everywhere if this isn't handled as an action *after* the interface is completely up - including a PPPoE connection started, but not authenticated and connected.

There is no error in the configuration itself, and the key is the first boot load when rebooting. In this case, some of the interfaces are not initialized.

@gadams Is there any way to dynamically increase the dependency list in vyos's current configuration?

jack9603301 added a comment.EditedFri, May 29, 2:40 AM

@CRCinAU @c-po I have already submitted a PR, to repair the problems caused by the first load after boot, because the interface has not yet been initialized, but this completely depends on the automatic restart of systemctl for fault recovery. I think this is not enough. It is natural to recover from the failure, but if there is a mechanism and processing, it would be better to postpone it until all interfaces have been initialized.

@c-po @gadams I have a remedy in mind. Can I use the delay command to execute the delay startup in the startup service file?

If you think it's possible to delay dhcp6c, I'll start making patches.

Please merge the following PR, if there is a problem, please let me modify it.

https://github.com/vyos/vyos-1x/pull/438

c-po added a comment.Sat, May 30, 6:03 PM

Begging for a merge will not speed things up. You multiple times refused to adopt the patch to our guidelines and it is still unclear if this is the right path to go.