Page MenuHomePhabricator

VyOS 1.1.x Unicast ARP to VRRP virtual MAC is ignored (RFC3768 mode)
Closed, ResolvedPublicBUG

Description

Troubleshooting remains a work in progress ... but as others are likely running into this issue, here is the write-up so far:

Symptom:

When VRRP is in use using RFC3768 mode (virtual MAC address) broadcast ARP is responded to correctly but unicast ARP (directed to virtual MAC instead of broadcast MAC) is ignored.

This issue is specific to versions 1.1.8 and lower and is not observed in VyOS 1.2 (rolling).

Impact:

While most host implimentations will attempt a broadcast ARP request upon failing a unicast request there are some conditions where this will result in broken connectivity. ARP unicast poll is required by RFC1122 (2.3.2.1-2) and as such any host should always respond to a unicast ARP.

Example cases where this situation can be disruptive:

  1. When using a Cisco router with a static route pointing to a VRRP next-hop on VyOS failure to see a response to unicast ARP will mark a neighbor as stale affecting forwarding until the global timeout is met and the ARP entry is expired from the table (triggering a broadcast ARP). A work-around is to configure a static APR entry on the Cisco router.
  2. Newer versions of Android (8+) on some phones have begun to make use of ARP unicast poll to verify wifi connectivity and upon not seeing a response attempt to reassociate to a new wireless access point (the result is almost an unusable wireless expereince). The functionality seems to be specific to the phone vendor as Samsung phones are not affected but the Google Pixel 3 is confirmed affected by this issue. There is currently no known work-around (aside from not using VRRP on VyOS in RFC3768 compatibity mode).

Conditions:

VRRP is in use on VyOS 1.1.x with RFC3768 compatibility enabled

Steps to Reproduce:

The following test configuration was used on 1.1.8:

Test Node A:

set interfaces ethernet eth0 address 10.0.1.2/24
set interfaces ethernet eth0 vrrp vrrp-group 1 preempt true
set interfaces ethernet eth0 vrrp vrrp-group 1 preemnt-delay 300
set interfaces ethernet eth0 vrrp vrrp-group 1 priority 200                 # primary
set interfaces ethernet eth0 vrrp vrrp-group 1 rfc3768-compatibility
set interfaces ethernet eth0 vrrp vrrp-group 1 virtual-address 10.0.1.1/24

Test Node B:

set interfaces ethernet eth0 address 10.0.1.3/24
set interfaces ethernet eth0 vrrp vrrp-group 1 preempt true
set interfaces ethernet eth0 vrrp vrrp-group 1 preemnt-delay 300
set interfaces ethernet eth0 vrrp vrrp-group 1 priority 100                 # standby
set interfaces ethernet eth0 vrrp vrrp-group 1 rfc3768-compatibility
set interfaces ethernet eth0 vrrp vrrp-group 1 virtual-address 10.0.1.1/24

Newer versions of the "arping" utility (2.19 was used for this test) support the optional -t attribute for specifcying a target MAC to produce a unicast ARP:

arping -c 1 -t 00:00:5e:00:01:01 10.0.1.1

This can be verified by running tcpdump with the -e flag to include the src and dst MAC in the output.

arping -c 1 10.0.1.1 (broadcast ARP) will be responded to
arping -c 1 -t 00:00:5e:00:01:01 10.0.1.1 (unicast ARP to virtual MAC) will not be responded to

Technical Details:

This appears to be a problem with the creation of the macvlan virtual interface by keepalived 1.2.2:

The macvlan interface is created via rtnetlink without a correct mode attribute set. This may be that keepalived is neglecting to do so, or that a default is assumed which is not being applied due to the age of the macvlan kernel driver.

The problem can be observed in the output of:

ip -d link show

The -d flag will show details including the interface type of macvlan and a mode of "unknown". While in a mode of "unknown" the macvlan interface is not completely initialized.

Upon review of later releases of keepalived it appears that the desired mode set by keepalived for the macvlan interface is "private". However, this is only for later versions of keepalived which address an issue where private mode filters VRRP messages from the neighbor, causing a split-brain/dual-master state. Alternatively, the macvlan mode of bridge can be used which allows keepalived to see peer VRRP advertisements and works as intended. We are still testing but at this time we have not observed any issues as a result of using macvlan mode bridge.

This mode can be applied once the macvlan interface has been created by keepalived using the following:

ip link set dev eth0v1 type macvlan mode bridge

Upon setting the mode attribute of the macvlan interface(s) correct unicast ARP behavior is restored.

Workaround (Partial):

NOTE: This is fix is still in testing. Use at your own risk.

Create a script to set macvlan modes correctly:

/config/scripts/vrrp-macvlan-fix.script:

#!/bin/bash

for i in `ip -o -d link show | grep 'macvlan  mode unknown' | gawk -F " |@" '{print $2}'`; do ip link set dev $i type macvlan mode bridge; done;

This script can be called in /config/scripts/vyatta-postconfig-bookup.script to run on boot but note that it will need to be re-applied for any keepalived service restart (e.g. configuration change).

The script can also be referenced as a 1 min cron to handle this condition using task-scheduler:

set system task-scheduler task T860 executable path '/config/scripts/vrrp-macvlan-fix.script'
set system task-scheduler task T860 interval '1'

An open caveaut with this workaround is that behavior when multiple VRRP groups are attached to the same physical interface has not been tested yet. You should take care to test these workarounds with your configuration in a lab setting before deploying in production.

Details

Difficulty level
Unknown (require assessment)
Version
1.1.8
Why the issue appeared?
Will be filled on close

Event Timeline

rps created this task.Sep 23 2018, 7:01 PM
rps created this object in space S1 VyOS Public.
rps created this object with visibility "Public (No Login Required)".
rps updated the task description. (Show Details)Sep 24 2018, 1:56 PM
rps added a comment.Sep 24 2018, 7:53 PM

Expanding on this more, I've updated the fix above to suggest a workaround of bridge mode for macvlan interfaces.

From what I can see it looks like the behavior of private mode for macvlan interfaces was dropping VRRP messages coming from the peer, resulting in a split-brain/dual-master scenario. This is due to an oversight in the kernel driver that was only patched later on.

Going through code commits for keepalived, it looks like they did run into this, and their solution was to introduce vmac_xmit_base option to listen for VRRP messages on the physical interface rather than the virtual one:

http://www.keepalived.org/doc/software_design.html

Starting at "Note on Using VRRP with Virtual MAC Address" discusses the need for a Linux kernel patch for the default macvlan mode to work correctly.

Unfortinately vmac_xmit_base appears to have had some bugs not fixed until later releases of keepalived as well.

It looks like the use of the bridge mode rather than private mode for a macvlan interface provides a workaround.

The bridge mode differs only from private in that traffic from other macvlan interfaces on the same physical interface is not filtered (think of private as a private VLAN implimentation).

syncer triaged this task as Normal priority.Sep 25 2018, 1:53 PM
rps added a comment.Sep 25 2018, 10:07 PM

Quick note that the work-around above breaks the local DNS resolver if pointing to a virtual IP. Still keeping an eye out for other issues.

syncer added a subscriber: syncer.Oct 13 2018, 5:56 PM

@rps what about 1.2 ?

syncer assigned this task to rps.Oct 13 2018, 7:06 PM
syncer changed the subtype of this task from "Task" to "Bug".Oct 20 2018, 7:05 AM
rps added a comment.Oct 23 2018, 1:03 PM

The functionality is fixed in 1.2-rcX for ARP, I haven't verified other services such as DNS.

pasik added a subscriber: pasik.Nov 4 2018, 11:22 AM
syncer closed this task as Resolved.Feb 8 2019, 12:14 AM
syncer edited projects, added VyOS 1.1.x; removed VyOS 1.2 Crux (VyOS 1.2.0-EPA3).