Page MenuHomePhabricator

VRRPv3 support (VRRP for IPv6)
Closed, DuplicatePublic

Details

Difficulty level
Normal (likely a few hours)
afics created this task.Jul 17 2016, 11:12 AM

TODO:

  • check if our keepalived version supports VRRPv3, if not, upgrade to a newer version.
  • implement cli support + config generation

A few notes:

  • We need extra VRRP instances for IPv4 and IPv6, keepalived can only have virtual_addresses of the instances native address family.
  • VRRPv3 supports IPv4 and IPv6
  • Maybe we should still keep support for VRRPv2 for backwards compatibility?
syncer triaged this task as High priority.Jul 17 2016, 1:25 PM
syncer added a project: VyOS 1.1.x (1.1.8).
jbrown added a subscriber: jbrown.Jul 23 2016, 10:07 AM

We need extra VRRP instances for IPv4 and IPv6, keepalived can only have virtual_addresses of the instances native address family.

AFAIK, as long as vrrp_strict isn't set, keepalived happily supports mixed v4 and v6 addresses in the same group (and even under VRRPv2). I've run that configuration under keepalived for several years.

I tested with keepalived version 1.2.22 on Fedora and it didn't seem to work. I'll test again.

[root@test ~]# cat /etc/keepalived/keepalived.conf 
vrrp_instance VI_1 {
    state MASTER
    interface ens3
    virtual_router_id 51
    priority 200
    advert_int 1
    vrrp_version 3
    native_ipv6
    authentication {
        auth_type ah
        auth_pass 1111
    }
    virtual_ipaddress {
        3ffa::1/64
        192.168.100.200/24 
    }
}

Does only work for the v6 address with this configuration.


[root@test ~]# cat /etc/keepalived/keepalived.conf 
vrrp_instance VI_1 {
    state MASTER
    interface ens3
    virtual_router_id 51
    priority 200
    advert_int 1
    vrrp_version 3
    #native_ipv6
    authentication {
        auth_type ah
        auth_pass 1111
    }
    virtual_ipaddress {
        3ffa::1/64
        192.168.100.200/24 
    }
}

Does only work for the v6 address with this configuration.

[root@test ~]# cat /etc/keepalived/keepalived.conf 
vrrp_instance VI_1 {
    state MASTER
    interface ens3
    virtual_router_id 51
    priority 200
    advert_int 1
    vrrp_version 3
    #native_ipv6
    authentication {
        auth_type ah
        auth_pass 1111
    }
    virtual_ipaddress {
        #3ffa::1/64
        192.168.100.200/24 
    }
}

Works for v4 only.

[root@test ~]# cat /etc/keepalived/keepalived.conf 
vrrp_instance VI_1 {
    state MASTER
    interface ens3
    virtual_router_id 51
    priority 200
    advert_int 1
    vrrp_version 3
    native_ipv6
    authentication {
        auth_type ah
        auth_pass 1111
    }
    virtual_ipaddress {
        3ffa::1/64
        #192.168.100.200/24 
    }
}

Works for v6 only.


Separate instances for v4 and v6 do work.

@jbrown Did I get something obvious wrong?

Try it with vrrp_version 2; I believe that will work. It seems to default to standards-compliant behavior (that is to say, no mixed-virtual-addresses) in v3. Not sure if there's a way to turn that off or not. Here's what the config at my current employer looks like

vrrp_instance lb_hi {
    state BACKUP
    interface eth0
    virtual_router_id 186
    priority 50
    advert_int 5
    authentication {
        auth_type PASS
        auth_pass XXX
    }
    virtual_ipaddress {
        ipv6::address::1/128 scope global dev eth1
        ipv6::address::1/128 scope global dev eth1
        ipv4.address.1/32 scope global dev eth1
        ipv4.address.2/32 scope global dev eth1
        ipv4.address.3/32 scope global dev eth0
    }
    virtual_ipaddress_excluded {
    }
}

vrrp_sync_group easypost_lb_hi {
    group {
        lb_hi
    }
    notify /srv/lbng/notify-state.sh
}

That's with 1.2.16 (CentOS).

I can run some more tests when I'm in the office on Monday.

Also, here's the config that I'm running on my VyOS test VMs (with R33:5bccce5948cc44e76f8c9da93f1a1cf8f1212bca ):

global_defs {
	enable_traps
}
vrrp_instance vyatta-eth1-1 {
	state BACKUP
	interface eth1
	virtual_router_id 1
	priority 100
	advert_int 1
	virtual_ipaddress {
		172.16.0.1/16
		fd00:ea51:d000::1/64
	}
}

This is vyos 1.1.7, so keepalived 1.2.2

afics added a comment.EditedJul 24 2016, 7:30 PM

@jbrown This only works for you because your keepalived versions are old enough.
This got "fixed" (well, at least they're standards compliant now ;)) in 1.2.20 I believe.
See https://github.com/acassen/keepalived/issues/375#issuecomment-230148110 for more information.

So I'm afraid we'll have to find another solution. As I see it, we have two options:

  • Either use separate vrrp_instances for v4 and v6 (autogenerated), ugly.
  • Script it our selves

Oh man, that's going to be a disaster for all the companies out there relying on mixed v4/v6 blocks (which is extremely common in v6 setups).

What about something where, in the short term to support ipv6, we *required* both v4 and v6 addresses on any VRRP group and put all the v6 addresses in virtual_ipaddress_excluded?

If I make a patch to use virtual_ipaddress_excluded, do you prefer an arcanist diff or a github pull request?

afics added a comment.Jul 24 2016, 8:05 PM

Does it work, if you use virtual_ipaddress_excluded? Also I don't really understand how this would solve the problem? Could you please explain it?

I believe a github pull request is fine.

virtual_ipaddress_excluded is basically a list of lines that keepalived passes to ip addr add / ip addr del when the main VRRP state changes. Typically, it's used to manage more than 20 addresses with keepalived (only the first 20 go in virtual_ipaddress, which is in the actual VRRP packet; the rest are failed over through this out-of-band configuration). According to that thread (and past experience), it should work for this case.

I changed my test VM to use virtual_ipaddress_excluded, and failover worked correctly.

https://github.com/vyos/vyatta-vrrp/pull/6 is a rough estimate of what implementation would look like.

This may be getting a little off-track from this ticket, I guess; the subject (VRRPv3) will allow v6-only configurations, which is probably something people want even if we get mixed configs working somehow.

afics added a comment.Jul 24 2016, 8:43 PM

Ah, good to know. So if we add a switch like transport ipv4/ipv6 to the cli which is only valid for VRRPv3 (add a switch for that too) and then exclude either all v4 or all v6 addresses, would that work?

If it does, that way we would have implemented at least full functionality.

rps awarded a token.Sep 15 2016, 10:16 AM
rps added a subscriber: rps.
syncer edited subscribers, added: VyOS 1.2.x; removed: VyOS 1.1.x (1.1.8).
syncer mentioned this in Unknown Object (Ponder Answer).May 12 2017, 3:33 PM
aopdal added a subscriber: aopdal.May 26 2017, 10:09 AM

Is there any progress on this? Is there any design documents in progress?

I assume there will be VRRPv2 and VRRPv3 in 1.2 to be able to upgrade existing infrastructure. So if we want to use IPv6 in VRRP we need to switch from VRRPv2 to VRRPv3 with some downtime and when running VRRPv3 with IPv4 it should be possible to add IPv6.

Change from VRRPv2 only to VRRPv3 only is not an option due to the fact that this will break the possibility to perform upgrades.

News about this ?

Any chance to get this fixed in 1.1.8?
We absolutely need ipv6 failover

Even the virtual_ipaddress_excluded workaround would be great as long vyos supports ipv6 in vrrp ASAP

oddboy added a subscriber: oddboy.Nov 17 2017, 6:56 AM

Hi all,

I wouldn't mind an update on this too. it would be very useful.

Using two debian VM i have played around with this today.
I have been using debian 9.2 and keepalived v1.3.2

This is just a my PoC or a proposal to how we may solve this task.

I'm using just one interface for this test, and the feature in the router must be specified.

  • We must have a solution which are possible to upgrade
  • The user should be able to select which version of VRRP to run
  • We must provide a solution which may interop with other devices

When using this configurations it looks nice and the vrrp_instance running version 2 take care of the upgrade
the vrrp instance running version 3 is taking care of IPv6

If this approach is feasible we could start the specification details.

Device1

global_defs {
        enable_traps
}
vrrp_sync_group test {
        group {
                ens224-100
                ens224-100-v3
        }
}
vrrp_instance ens224-100 {
        state BACKUP
        interface ens224
        virtual_router_id 100
        priority 150
        nopreempt
        advert_int 1
        virtual_ipaddress {
                10.1.1.20
        }
}
vrrp_instance ens224-100-v3 {
        vrrp_version 3
		state BACKUP
        interface ens224
        virtual_router_id 100
        priority 150
        nopreempt
        advert_int 1
        virtual_ipaddress {
                2001:4642:3a8e:fff0::20
        }
}

Device2

global_defs {
        enable_traps
}
vrrp_sync_group test {
        group {
                ens224-100
                ens224-100-v3
        }
}
vrrp_instance ens224-100 {
        state BACKUP
        interface ens224
        virtual_router_id 100
        priority 50
        nopreempt
        advert_int 1
        virtual_ipaddress {
                10.1.1.20
        }
}
vrrp_instance ens224-100-v3 {
        vrrp_version 3
		state BACKUP
        interface ens224
        virtual_router_id 100
        priority 50
        nopreempt
        advert_int 1
        virtual_ipaddress {
                2001:4642:3a8e:fff0::20
        }
}

Does anyone have any ideas how to get VRRPv3 in 1.2?
If we could conclude on the approach we could go further by describing cli commands, make out how the upgrade should be done, create documentation and so on.

If anyone would like to implement it I may also test it.

syncer added a subscriber: syncer.Dec 21 2017, 1:23 PM

@aopdal try latest nightly as we pushed changes related to vrrp

Testing on

vyos@vyos-vrrp1# run sh ver
Version:          VyOS 999.201712220337
Built by:         autobuild@vyos.net
Built on:         Fri 22 Dec 2017 03:37 UTC
Build ID:         92b365a2-b8fc-4f12-9209-bee33323d53f

Architecture:     x86_64
Boot via:         installed image
System type:      VMware guest

Hardware vendor:  VMware, Inc.
Hardware model:   VMware Virtual Platform
Hardware S/N:     VMware-56 4d fc ad 58 82 3c aa-17 8a 66 57 cc 32 59 cc
Hardware UUID:    ADFC4D56-8258-AA3C-178A-6657CC3259CC

Copyright:        VyOS maintainers and contributors

With configuration:

set interfaces ethernet eth0 address 'dhcp'
set interfaces ethernet eth0 hw-id '00:0c:29:32:59:cc'
set interfaces ethernet eth1 address '10.1.1.31/24'
set interfaces ethernet eth1 hw-id '00:0c:29:32:59:d6'
set interfaces ethernet eth1 vrrp vrrp-group 30 preempt 'false'
set interfaces ethernet eth1 vrrp vrrp-group 30 priority '150'
set interfaces ethernet eth1 vrrp vrrp-group 30 virtual-address '10.1.1.30'
set interfaces loopback 'lo'
set system host-name 'vyos-vrrp1'

and

set interfaces ethernet eth0 address 'dhcp'
set interfaces ethernet eth0 hw-id '00:0c:29:32:59:cc'
set interfaces ethernet eth1 address '10.1.1.31/24'
set interfaces ethernet eth1 hw-id '00:0c:29:32:59:d6'
set interfaces ethernet eth1 vrrp vrrp-group 30 preempt 'false'
set interfaces ethernet eth1 vrrp vrrp-group 30 priority '150'
set interfaces ethernet eth1 vrrp vrrp-group 30 virtual-address '10.1.1.30'
set interfaces loopback 'lo'
set system host-name 'vyos-vrrp2'

And now VRRP is totally broken!

Please read the documentation of the protocols, and then we can decide how it should work, design and implement the feature according to the design.

VRRPv2 : https://tools.ietf.org/html/rfc3768
VRRPv3 : https://tools.ietf.org/html/rfc5798

The current implementation s using VRRPv3 group address, but are running VRRPv2.
The current implementation will break all upgrade.
The current implementation is not according to any standards.

The current implementation is working on keepalived 1.2.19 (from 2015.07.07). In 1.2.20 (from 2016-04-02) a lot of bugs are fixed and the possibility to use IPv6 in VRRPv2 is gone.
When implementing IPv6 / VRRPv3 we should probably base the implementation on a newer version of keepalived.

@aopdal can you please provide relevant information and not just bunch of already known info?
We need description of problem and how to reproduce it, not comments from captain obvious

On two debian 8 test VM I compiled keepalived 1.3.9 without any errors. It may be a good thing to get this latest version for our new implementation.

@syncer
Use the configurations I provided and observe the packets the router is sending out.
In the nightly build the router is sending out using the IPv6 group address
Up to 1.1.8 the router is sending out using the IPv4 group address
This makes upgrades impossible
Using VRRPv2 with both IPv4 and IPv6 virtual addresses in the same VRRP instance is only possible due to a bug in the 1.2.19 keepalived

I'm sorry if I don't express myself clear enough, but I still think this issue need some specification on how to implement the VRRPv3 in addition to VRRPv2 and not with a "dirty trick".

rps added a subscriber: dmbaturin.Mar 7 2018, 2:57 PM

Just checking on this. The nightly build for 1.2 has keepalived 1.2.19 and transition support for virtual IPv6 addresses using VRRPv2 appears to be functional at first glance.

This isn't so much a "dirty trick" as it is a method of supporting IPv6 in absence of VRRPv3, which might be "good enough" for 1.2 assuming that 1.3 will get us to full VRRPv3 support.

There is a concern, however, with how we handle configuration migration if we go that route. Support for IPv6 addresses using VRRPv2 will go away with the transition to keepalived 1.3 mentioned above. There are also a number of questions that need to be ask to determine if it is both possible and desirable to support keepalived 1.3 today, or if there are challenges that prevent us from doing so which would push us to stay on 1.2 and have to deal with configuration migrations in the future when we move to 1.3.

For VyOS 1.1 there were dependency issues, which is why this feature has been delayed so long. Now that VyOS 1.2 is based on Debian 8.10 (Jessie) instead of 6.0.10 (Squeeze) we might have more flexibility. Jessie still uses an older version of keepalived (1.2.13) by default. Debian 9 makes use of 1.3.2 and 10 (testing) makes use of 1.3.9.

There are a few questions:

  1. It was mentioned above that 1.3.9 builds cleanly on Jessie, so that might be a good candidate since we will be able to pull patches from the Debian Buster package in the future. Can we confirm that there are no additional dependency requirements that will force us to rebuild other packages?
  2. What are the configuration changes required between keepalived 1.2 and keepalived 1.3, if any, required to maintain existing VRRP functionality in VyOS 1.x (pre-IPv6), e.g. is 1.3.9 usable as a drop-in replacement or do we need to update configuration scripts?
  3. If there are configuration changes, is any functionality lost or no longer supported?

My hope is that the answer is that 1.3.9 builds cleanly and is a drop-in replacement. If so, I would move to adopt 1.3.9 so we can begin testing it in nightly builds.

If there is an issue that makes us stay on keepalived 1.2 then I would rather keep transition support for IPv6 virtual-addresses on VRRPv2 than not have any method of IPv6 support. The question would then become how do we handle future configuration migration needed.

I'm OK with simply saying that the configuration migration will be that IPv6 virtual-addresses are excluded from the migrated configuration and that they'll have to be rebuilt upon upgrade, as long as we tell users that is what will happen up-front in the release notes for 1.2, and again in the release notes for 1.3.

Because VRRPv3 would use different sync-groups, I don't think there would be any way to migrate IPv6 virtual-addresses into a working VRRPv3 configuration reliably (e.g. what about interfaces that share a sync group that don't have IPv6 addresses defined).

If we think we can pull off keepalived 1.3.9 for VyOS 1.2, I think it would be a good thing, but I don't share in the argument that IPv6 virtual-address support for VRRPv2 is the worst idea either.

@aopdal : Since you already did some of the leg work, can you provide some insight on questions 1-3 above? If not, I can spin up a VM to create a build environment to test with but it would be a while before I got around to it.

@syncer , @dmbaturin : Do you think there is time for VRRPv3 support in 1.2 or should we really be looking at 1.3 for this?

I think we should investigate possibility of v3 in 1.2
Will be possible to implement in 1.2 or not that is other story.

@aibanez @csalcedo your input on this question aslo welcome

aopdal added a comment.EditedMar 8 2018, 10:28 AM

The showstopper for me to upgrade to 1.2 with current aproach is the configuration statement (in keepalived configuration)

native_ipv6

This force keepalived to use IPv6 group address (FF02::12) and after a upgrade both routers becomes master, and I have duplicate IP's. With ca 55 peers of routers this is a no go for me.

I have just tested on clean debian and not on vyos, but I just swapped the keepalived.
In Vyos the configuration of keepalived is organized as a vrrp instance per interface and vrrp-group. This is a clean and good approach which makes it possible to keep combability. If we upgrade the keepalived a cli statement of vrrp-version could be added per interface/vrrp-group to generate separate instances for V2 and V3 (like in my sample configurations).
With upgraded keepalived some cli would change, and the script creating the keepalived configuration must be changed. Unfortunately, I can’t do it myself, but I may assist in testing and discussions.
I have not found anything removed from a the new keepalived which breaks anything else than using both IPv4 and IPv6 virtual addresses in a VRRPv2 instance.
The steps to implement VRRP v3 properly as I see it would be:

  1. Upgrade keepalived to a quite new version
  2. Remove the possibility to configure IPv6 addresses in current CLI
  3. Change the script so “native_ipv6” is removed from keepalived configuration.
  4. Test upgrade, failover, scripts for transition etc so we verify current configurations will work
  5. Specifications of the VRRPv3 instances should be created
  6. Implementation of VRRPv3
  7. Testing
  8. And document the feature for users.

In this we also need to investigate Neighbor discovery and router advertisement. This should run only on the master, if configured. If the user should configure this and use transition scripts to control the changes we need to have some quite good documentation. If the router should fix it “under the hood”, we must do it exactly right.
I don’t think the implementation is very hard to do, but specifications and design to get it right requires a bit of effort.

rps added a comment.Mar 8 2018, 4:19 PM

Alright.

It sounds like we can give the upgrade to keepalived 1.3 a try provided we go back to virtual-address support IPv4 only (like in 1.1) and remove the native_ipv6 statement from the configuration script as step 1.

I think we should try to get this far and wait until the nightly builds are updated so we can verify everything is OK before moving forward.

Provided that looks good, then the next step would be adding support for VRRPv3 alongside VRRPv2 as an additional configuration block.

I agree that while it is useful to support user-defined state transition scripts that we might want to include a option for VRRPv3 to handle what happens to radvd. I have some ideas on that but do you have any recommendations on how we should handle RA for VRRPv3 @aopdal ?

@rps RA and VRRPv3 is quite a complex "thing". And it's easy to make something which don't work. If you have more than one VRRP group running on a network segment, only one of the groups should do RA. The most difficult case to solve may be if you run two routers with two groups on the same interface to do some kind of load balancing and also want to do RA. This may require configuration of RA on the VRRP group. But if you don't run VRRP you must configure RA directly on the interface.

So I think there will be ways to configure this feature to the network will behave strange, and we must assume the users to some degree knowing what they do. The need for statefull firewalling to work makes this kind of difficult. And statefull firewalling in combination with dynamic routing does it even more hard to solve "under the hood" in a predictable way which the users understand.

Maybe we should come up with some usecases which we will support, and describe in detail. Other usecases must be handled by the users using state transition scripts.

Hi,

I agree with @aopdal comments. Regarding the approach on how to handle RAs, probably the most common usecase would be a single VRRP group per interface. For that case, maybe an interface config stating that the interface should only send RAs if the VRRP group on that interface is in Master state could do the trick. As already said, other more complex scenarios (like the one with multiple VRRP groups per interface for load balancing) probably would require state transition scripts (or not relying in RAs).

rps added a comment.Mar 12 2018, 2:00 PM

Pretty much agree but don't think we need to worry about supporting a use case that is likely to create other problems until someone actually requests it as a feature.

So my thought is that instead of simply starting or stopping radvd we should update the generation of radvd.conf to be VRRPv3 aware and include or exclude VRRPv3 interfaces based on the state of the sync-group as an additional check.

The transition script would be implemented in keepalived.conf at the _sync_group level under notify_ script hooks (rather than the _instance level).

This seems to be the cleanest solution as it doesn't break RA for non- VRRPv3 interfaces and I think it's fair to say that if you're enabling VRRPv3 for failover on an interface that any RA will only work when the sync-group is in a MASTER role.

We could go down the rabbit hole of trying to do this per-prefix with some matching of virtual-address to prefix but I think that would be both confusing for people.

There is a larger question about how to deal with conntrack-sync if more than one sync-group is in use but I think that can be addressed later on as a larger discussion on how we handle failover and clustering in VyOS.

syncer assigned this task to dmbaturin.May 27 2018, 9:08 AM