Page MenuHomeVyOS Platform

DMVPN NHRP spoke stops sending resolution requests upon restart
Open, HighPublicBUG

Description

I’ve encountered a possible bug which renders DMVPN spokes almost unusable in DMVPN phase 2|3 scenario. Using static maps instead of nhrp resolutions eliminates the problem.
The lab to investigate is as follows:

HUB IP: NBMA: 192.168.77.66, tun IP: 10.160.17.131/25
Spoke1 IP: NBMA: 192.168.77.64, tun IP:10.160.17.141
Spoke2 IP: NBMA: 192.168.77.65, tun IP:10.160.17.151

No IPSEC, no NAT.
HUB config:

set interfaces ethernet eth0 address ‘dhcp’
set interfaces loopback lo
set interfaces tunnel tun1 address ‘10.160.17.131/25’
set interfaces tunnel tun1 description ‘DMVPN_HUB_interface’
set interfaces tunnel tun1 encapsulation ‘gre’
set interfaces tunnel tun1 multicast ‘enable’
set interfaces tunnel tun1 parameters ip key ‘17128’
set interfaces tunnel tun1 source-address ‘192.168.77.66’
set protocols nhrp tunnel tun1 cisco-authentication ‘LPDMVPN’
set protocols nhrp tunnel tun1 holding-time ‘300’
set protocols nhrp tunnel tun1 multicast ‘dynamic’

Spoke1 config:

set interfaces ethernet eth0 address ‘dhcp’
set interfaces tunnel tun10 address ‘10.160.17.141/25’
set interfaces tunnel tun10 encapsulation ‘gre’
set interfaces tunnel tun10 multicast ‘enable’
set interfaces tunnel tun10 parameters ip key ‘17128’
set interfaces tunnel tun10 source-address ‘192.168.77.64’
set protocols nhrp tunnel tun10 cisco-authentication ‘LPDMVPN’
set protocols nhrp tunnel tun10 holding-time ‘300’
set protocols nhrp tunnel tun10 map 10.160.17.131/25 nbma-address ‘192.168.77.66’
set protocols nhrp tunnel tun10 map 10.160.17.131/25 register
set protocols nhrp tunnel tun10 multicast ‘nhs’

Spoke2 config:

set interfaces ethernet eth0 address ‘dhcp’
set interfaces tunnel tun10 address ‘10.160.17.151/25’
set interfaces tunnel tun10 encapsulation ‘gre’
set interfaces tunnel tun10 multicast ‘enable’
set interfaces tunnel tun10 parameters ip key ‘17128’
set interfaces tunnel tun10 source-address ‘192.168.77.65’
set protocols nhrp tunnel tun10 cisco-authentication ‘LPDMVPN’
set protocols nhrp tunnel tun10 holding-time ‘300’
set protocols nhrp tunnel tun10 map 10.160.17.131/25 nbma-address ‘192.168.77.66’
set protocols nhrp tunnel tun10 map 10.160.17.131/25 register
set protocols nhrp tunnel tun10 multicast ‘nhs’

At the beginning all the tunnels are nice and fine:
HUB:>

vyos@HUB:~$ show nhrp tu
Status: ok

Interface: tun1
Type: local
Protocol-Address: 10.160.17.255/32
Alias-Address: 10.160.17.131
Flags: up

Interface: tun1
Type: local
Protocol-Address: 10.160.17.131/32
Flags: up

Interface: tun1
Type: dynamic
Protocol-Address: 10.160.17.141/32
NBMA-Address: 192.168.77.64
Flags: up
Expires-In: 4:01

Interface: tun1
Type: dynamic
Protocol-Address: 10.160.17.151/32
NBMA-Address: 192.168.77.65
Flags: up
Expires-In: 3:32

Spoke1:

vyos@S1:~$ show nhrp tu
Status: ok

Interface: tun10
Type: local
Protocol-Address: 10.160.17.255/32
Alias-Address: 10.160.17.141
Flags: up

Interface: tun10
Type: local
Protocol-Address: 10.160.17.141/32
Flags: up

Interface: tun10
Type: cached
Protocol-Address: 10.160.17.151/32
NBMA-Address: 192.168.77.65
Flags: used up
Expires-In: 4:56

Interface: tun10
Type: static
Protocol-Address: 10.160.17.131/25
NBMA-Address: 192.168.77.66
Flags: up

Spoke2:>

vyos@S2:~$ show nhrp tu
Status: ok

Interface: tun10
Type: local
Protocol-Address: 10.160.17.255/32
Alias-Address: 10.160.17.151
Flags: up

Interface: tun10
Type: local
Protocol-Address: 10.160.17.151/32
Flags: up

Interface: tun10
Type: cached
Protocol-Address: 10.160.17.141/32
NBMA-Address: 192.168.77.64
Flags: up
Expires-In: 2:27

Interface: tun10
Type: static
Protocol-Address: 10.160.17.131/25
NBMA-Address: 192.168.77.66
Flags: up

NHRP logs from Spoke1 (when it works:):

Apr 21 17:38:35 opennhrp[1779]: NL-ARP(tun10) who-has 10.160.17.151
Apr 21 17:38:35 opennhrp[1779]: NL-ARP(tun10) 10.160.17.151 is-at 192.168.77.66
Apr 21 17:38:35 opennhrp[1779]: Adding incomplete 10.160.17.151/32 dev tun10
Apr 21 17:38:35 opennhrp[1779]: Sending Resolution Request to 10.160.17.151
Apr 21 17:38:35 opennhrp[1779]: Sending packet 1, from: 10.160.17.141 (nbma 192.168.77.64), to: 10.160.17.151 (nbma 192.168.77.66)
Apr 21 17:38:35 opennhrp[1779]: Received Resolution Request from proto src 10.160.17.151 to 10.160.17.141
Apr 21 17:38:35 opennhrp[1779]: Sending Resolution Reply 10.160.17.141/32 is-at 192.168.77.64 (holdtime 300)
Apr 21 17:38:35 opennhrp[1779]: Sending packet 2, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
Apr 21 17:38:35 opennhrp[1779]: Received Resolution Reply 10.160.17.151/32 is at proto 10.160.17.151 nbma 192.168.77.65 (code 0)
Apr 21 17:38:35 opennhrp-script.py[2919]: Running script with arguments: [‘/etc/opennhrp/opennhrp-script.py’, ‘peer-up’], environment: environ({‘NHRP_TYPE’: ‘cached’, ‘NHRP_SRCADDR’: ‘10.160.17.141’, ‘NHRP_SRCNBMA’: ‘192.168.77.64’, ‘NHRP_DESTADDR’: ‘10.160.17.151’, ‘NHRP_DESTPREFIX’: ‘32’, ‘NHRP_DESTNBMA’: ‘192.168.77.65’, ‘NHRP_INTERFACE’: ‘tun10’, ‘NHRP_GRE_KEY’: ‘17128’, ‘LC_CTYPE’: ‘C.UTF-8’})
Apr 21 17:38:35 opennhrp-script.py[2919]: Peer UP event for spoke using IKE profile
Apr 21 17:38:35 opennhrp-script.py[2919]: NBMA addresses: local 192.168.77.64, remote 192.168.77.65
Apr 21 17:38:35 opennhrp[1779]: [10.160.17.151] Peer up script: success
Apr 21 17:38:35 opennhrp[1779]: NL-ARP(tun10) 10.160.17.151 is-at 192.168.77.65
At this point I see all the resolution requests and replies on the spoke side.

Steps to reproduce:
Modify any configuration on spoke-side with commit and optionally revert back.
(for example nhrp hold-time)
tear down connection between S-S|S-H peers or modify HUB config (works sometimes).
The Spoke stops sending resolution requests, while registration requests to hub works well. That can be seen in a traffic dump taken on the spoke/hub.

NHRP logs from Spoke1 (in failed state):

vyos@S1:~$ monitor log nhrp
 – Journal begins at Sat 2023-01-28 14:10:46 UTC. –
 Apr 21 17:45:51 opennhrp[3402]: Received Resolution Request from proto src 10.160.17.151 to 10.160.17.141
 Apr 21 17:45:51 opennhrp[3402]: Sending Resolution Reply 10.160.17.141/32 is-at (unspecified) (holdtime 300)
 Apr 21 17:45:51 opennhrp[3402]: Sending packet 2, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma (unspecified))
 Apr 21 17:45:51 opennhrp[3402]: Cannot send packet to tun10(9): Invalid argument
 Apr 21 17:45:56 opennhrp[3402]: Sending packet 1, from: 10.160.17.141 (nbma (unspecified)), to: 10.160.17.151 (nbma (unspecified))
 Apr 21 17:45:56 opennhrp[3402]: Cannot send packet to tun10(9): Invalid argument
 Apr 21 17:45:56 opennhrp[3402]: Received Resolution Request from proto src 10.160.17.151 to 10.160.17.141
 Apr 21 17:45:56 opennhrp[3402]: Sending Resolution Reply 10.160.17.141/32 is-at (unspecified) (holdtime 300)
 Apr 21 17:45:56 opennhrp[3402]: Sending packet 2, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma (unspecified))
 Apr 21 17:45:56 opennhrp[3402]: Cannot send packet to tun10(9): Invalid argument
 Apr 21 17:46:01 opennhrp[3402]: Failed to resolve 10.160.17.151: timeout (65535)
 Apr 21 17:46:01 opennhrp[3402]: Removing incomplete 10.160.17.151/32 dev tun10 used

Spoke 2 in the meanwhile gets response timeouts because of S1 keeps silent about resolution replies:

vyos@S2:~$ monitor log nhrp
  – Journal begins at Sat 2023-01-28 14:10:46 UTC. –
  Apr 21 17:49:40 opennhrp[1613]: NL-ARP(tun10) who-has 10.160.17.141
  Apr 21 17:49:40 opennhrp[1613]: NL-ARP(tun10) 10.160.17.141 is-at 192.168.77.66
  Apr 21 17:49:40 opennhrp[1613]: Adding incomplete 10.160.17.141/32 dev tun10
  Apr 21 17:49:40 opennhrp[1613]: Sending Resolution Request to 10.160.17.141
  Apr 21 17:49:40 opennhrp[1613]: Sending packet 1, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
  Apr 21 17:49:45 opennhrp[1613]: Sending packet 1, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
  Apr 21 17:49:50 opennhrp[1613]: Sending packet 1, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
  Apr 21 17:49:55 opennhrp[1613]: Sending packet 1, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
  Apr 21 17:50:00 opennhrp[1613]: Sending packet 1, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
  Apr 21 17:50:05 opennhrp[1613]: Sending packet 1, from: 10.160.17.151 (nbma 192.168.77.65), to: 10.160.17.141 (nbma 192.168.77.66)
  Apr 21 17:50:10 opennhrp[1613]: Failed to resolve 10.160.17.141: timeout (65535)
  Apr 21 17:50:10 opennhrp[1613]: Removing incomplete 10.160.17.141/32 dev tun10 used
If we modify any nhrp configuration on S2 it behaves the same as S1 and stops sending NHRP resolution requests:

    Apr 21 17:55:26 opennhrp[3654]: NL-ARP(tun10) who-has 10.160.17.141
    Apr 21 17:55:26 opennhrp[3654]: NL-ARP(tun10) 10.160.17.141 is-at 192.168.77.66
    Apr 21 17:55:26 opennhrp[3654]: Adding incomplete 10.160.17.141/32 dev tun10
    Apr 21 17:55:26 opennhrp[3654]: Sending Resolution Request to 10.160.17.141
    Apr 21 17:55:26 opennhrp[3654]: Sending packet 1, from: 10.160.17.151 (nbma (unspecified)), to: 10.160.17.141 (nbma (unspecified))
No real traffic here on a wire
Apr 21 17:55:26 opennhrp[3654]: Cannot send packet to tun10(9): Invalid argument
Apr 21 17:55:31 opennhrp[3654]: Sending packet 1, from: 10.160.17.151 (nbma (unspecified)), to: 10.160.17.141 (nbma (unspecified))
  Apr 21 17:55:31 opennhrp[3654]: Cannot send packet to tun10(9): Invalid argument

Restarting failed spoke always helps.
I’ve tested this issue on a plenty of 1.4 rolling versions as the issue persists for quite a long time during image upgrades. Here are some versions for example:

1: 1.4-rolling-202304130846 - latest
2: 1.4-rolling-202211290318
3: 1.4-rolling-202301280924

Observations:
There is no matter if we turn on nhrp redirect/shortcut on the hub/spoke. The spoke behaves the same in terms of nhrp resolution packets.
In the production env I use static S-S map entries which solves the problem, but scalability becomes a nightmare.

more info and testing described in the forum topic here:
https://forum.vyos.io/t/dmvpn-nhrp-spoke-stops-sending-resolution-requests-upon-restart/10874

Details

Difficulty level
Unknown (require assessment)
Version
vyos 1.4 rolling release + 1.3 stable
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)