Page MenuHomePhabricator

DHCP client sometimes doesn't start
Closed, ResolvedPublic

Description

With 1.2-rolling-201910021249 the DHCP client doesn't automatically start on some interfaces, but it can still be started using the renew command.

I tracked this down to commit 35c7d6616 which now only start dhclient when the interface is really up. The problem seems to be that some interfaces (in my case a bond vif) take time to get to the 'up' state and the interface is still down when the addresses are added, so the DHCP client is not started.

For consistency set_state() should probably wait for the requested state to be effective before it return. Adding such a test to set_state() fix the problem on my system.

Details

Difficulty level
Unknown (require assessment)
Version
-
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible

Event Timeline

albeu created this task.Oct 5 2019, 9:52 PM
albeu created this object in space S1 VyOS Public.
albeu added a comment.EditedOct 5 2019, 9:55 PM

The following patch fix the issue for me:

c-po claimed this task.Oct 6 2019, 7:29 AM
c-po triaged this task as Normal priority.
c-po added a project: VyOS 1.3 Equuleus.
c-po added a comment.EditedOct 6 2019, 10:30 AM

Hi @albeu thank you for this contribution. While reviewing it I found one flaw:

When link does not go up "in time" you raise an exception and the commit will fail. This will always happen to interfaces where no carrier is attached. Thus I will pick up on your approach and fine tune it.

c-po closed this task as Resolved.Oct 6 2019, 2:19 PM
c-po moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus board.Oct 13 2019, 3:01 PM
c-po added a comment.Oct 16 2019, 6:44 AM

Hi @albeu the fix is very bad in most of our cases and not really good to address a single issue. Can you give some hints to reproduce the "DHCP won't start on some interfaces"? problem?

c-po reopened this task as In progress.Oct 16 2019, 6:44 AM
albeu added a comment.Oct 16 2019, 8:19 AM

The system where I'm seeing this is a VM which use a BNX2 dual port network card via PCI pass-thru. Both ports (eth0 and eth1) are configured in a LACP bond (bond0) with several VIF running on top of it (bond0.10, bond0.20, etc). All VIF are showing this problem.

I would again point out to commit 35c7d6616 which added an "if self._state == 'up'" condition before starting the dhcp client. There is sadly no mention of this change in the commit log, so it is hard to tell why it was added. Note that such a test is not done for static address, so there is an asymmetry here. In my test removing this condition also solve the problem and I really don't understand why it is there. There is no harm in starting the dhcp client a bit too early, and much in not starting it at all.

There is also a conceptual problem because set_state() set the administrative state and get_state() return the operative state, but they are not the same and operative state is not supported by all drivers. See the documentation for more details.

pasik added a subscriber: pasik.Wed, Oct 16, 9:00 PM
c-po closed this task as Resolved.Fri, Oct 18, 4:13 PM

Root cause identified and fixed. Please test @albeu.

albeu added a comment.Fri, Oct 25, 4:14 PM

Tested on 1.2-rolling-201910250117, the issue is solved.