Page MenuHomePhabricator

Static routes not being applied in 1.2 Release
Closed, ResolvedPublicBUG

Description

Simple config

admin@edge:~$ show configuration commands | grep "protocols static"
set protocols static interface-route6 ::/0 next-hop-interface tun1
set protocols static route 0.0.0.0/0 next-hop 11.11.11.1
set protocols static table 22 interface-route 0.0.0.0/0 next-hop-interface vtun1

No static routes applied:

admin@edge:~$ show ip route static
admin@edge:~$

Gateway is fine

admin@edge:~$ ping 11.11.11.1 count 2
PING 11.11.11.1 (11.11.11.1) 56(84) bytes of data.
64 bytes from 11.11.11.1: icmp_seq=1 ttl=64 time=14.9 ms
64 bytes from 11.11.11.1: icmp_seq=2 ttl=64 time=2.08 ms
--- 11.11.11.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 2.083/8.517/14.952/6.435 ms

Unfortunately this seems dependent on some external factor. Other instances upgraded to the 1.2 version released today have their static routes working fine.

Details

Difficulty level
Unknown (require assessment)
Version
1.2.0
Why the issue appeared?
Will be filled on close
kroy created this task.Jan 30 2019, 10:44 PM
kroy added a comment.Jan 30 2019, 10:58 PM

Too add, routes are present in FRR

ip route 0.0.0.0/0 vtun1 table 22
ip route 0.0.0.0/0 11.11.11.1
ipv6 route ::/0 tun1
!
syncer triaged this task as High priority.
syncer assigned this task to dmbaturin.
syncer moved this task from Needs Triage to In Progress on the VyOS 1.2 Crux (VyOS 1.2.1) board.
syncer changed the task status from Open to Confirmed.
kroy added a comment.Jan 31 2019, 1:01 AM

I tracked down what is causing this.

Any of the instances I upgraded to the 1.2 LTS today fail to add static routes if they have wireguard interfaces.

kroy added a comment.Jan 31 2019, 1:10 AM

And more info:

it seems to be the upgrade to FRR 7.1 that breaks it. The 25th rolling works as expected and it has 6.1. The 26th rolling won’t work with static routes and it has 7.1.

primoz added a subscriber: primoz.Feb 1 2019, 9:20 PM

Adding

staticd=yes

to /etc/frr/daemons (+ restarting frr) seems to fix this.

Please have a look at: https://github.com/FRRouting/frr/wiki/Frr-5.0-%E2%86%92-TBD (the bold part).

Wierd, i cannot reproduce this on LTS 1.2.0 on both baremetal and virtual instances.

kroy added a comment.Feb 1 2019, 9:59 PM

@Maltahl
@primoz

There might actually be a bit of a deeper problem here, somewhat conditional on some static interface routing. On an broken system, it does say something about staticd starting

@dmbaturin has been working on this and has more details.

c-po added a subscriber: c-po.Feb 2 2019, 9:53 AM

I can confirm this. 1.2.0-EPA3 does not have thisbissue but 1.2.0 has it.

I‘m redistributing a static route into OSPF for my L2TP/IPSec road warrior clients

@primoz Adding staticd to the daemons config fixes the issue reproducibly on affected systems, even after reboot?

It solved it for me yesterday. After some more playing today this now seems to be a frr bug.

vtysh

show running-config

...
ip route 10.0.0.0/8 Null0
ip route 777.888.1.0/24 777.888.1.250
ip route 192.168.0.0/16 192.168.100.1
...

show ip route static

Codes: K - kernel route, C - connected, S - static, R - RIP,

O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route

S>* 10.0.0.0/8 [1/0] unreachable (blackhole), 04:23:36
S>* 777.888.1.0/24 [1/0] via 777.888.1.250, eth7.2500, 04:23:36

but without that staticd=yes it hasn't installed anything (so nothing at all). with staticd=yes I get "everything" but last.

primoz added a comment.Feb 2 2019, 1:14 PM

After some more playing with it ... it solves the problem reproducibly to have staticd=yes included and NOT have the null route anywhere.

So something strange happens with frr if you have a null route somewhere.

hagbard added a subscriber: hagbard.EditedFeb 2 2019, 7:06 PM

Hmm, I have 7.1-dev-1~debian8+1 on a rolling and 3 blackhole routes and no issues at all.

S>* 0.0.0.0/0 [210/0] via 172.103.x.x, eth0, 01w0d00h
S>* 10.0.0.0/8 [254/0] unreachable (blackhole), 01w0d00h
S>* 172.16.0.0/12 [254/0] unreachable (blackhole), 01w0d00h
C>* 172.103.x.x/26 is directly connected, eth0, 01w0d00h
S>* 192.168.0.0/16 [254/0] unreachable (blackhole), 01w0d00h
C>* 192.168.0.0/24 is directly connected, eth1, 01w0d00h
B>* 192.168.1.0/24 [20/0] via 192.168.100.2, eth2, 01w0d00h
C>* 192.168.100.0/30 is directly connected, eth2, 01w0d00h

Also no issues with ipv6.

pasik added a subscriber: pasik.Feb 4 2019, 7:32 PM
lbv2rus added a subscriber: lbv2rus.EditedFeb 6 2019, 3:04 PM

I can confirm, that fresh installed instance of 1.2 do not add static routes, include default route.
After deleting all "protocols static" section and recreating it manualy, only default route is added.
But, after reboot no default route is set.

Magic with staticd=yes does not work for me.

static {
    route 0.0.0.0/0 {
        next-hop A.B.C.D {
        }
    }
    route 172.X.Y.Z/24 {
        next-hop AA.BB.CC.DD {
        }
    }
    table 10 {
        interface-route 0.0.0.0/0 {
            next-hop-interface vtun0 {
            }
        }
    }
    table 100 {
        route 0.0.0.0/0 {
            next-hop K.L.M.N. {
            }
        }
    }
}
kroy added a comment.Feb 6 2019, 4:39 PM

@lbv2rus There might actually be a few problems here. We might have hacked out that it's the interface-route with the custom routing table that's causing the problem.

Removing that should bring back static routes.

In T1218#32227, @kroy wrote:

@lbv2rus There might actually be a few problems here. We might have hacked out that it's the interface-route with the custom routing table that's causing the problem.

Removing that should bring back static routes.

@kroy,
I will test it, but this is not a solution for me.
I use routing policies with custom tables in current env.
In 1.1.8 all works fine,and in EPA-3 too.

kroy added a comment.Feb 6 2019, 4:57 PM

@lbv2rus

Yeah, it wasn't really a workable solution for me either and I too had to roll back. But it would be good to confirm that is the problem.

lbv2rus added a comment.EditedFeb 7 2019, 4:15 PM
In T1218#32229, @kroy wrote:

@lbv2rus

Yeah, it wasn't really a workable solution for me either and I too had to roll back. But it would be good to confirm that is the problem.

@kroy, I tested some combinations of configuration.
I can confirm, that problem is in interface_route to vtun interface, that is down on boot-time.
Without interface_route all static routes has been applyed after reboot with and without staticd=yes.
In vtysh all routes are present all time.
After interface vtun0 go up, it is possible to add interface_route to any routing table.

I think, that all problems is FRR 7.1, that is not stable release, and it was not good idea to build LTS version with dev-package without continuous testing.

Merijn added a subscriber: Merijn.Feb 12 2019, 11:13 PM

I am experiencing the same issues with a router i tested with 1.2.0 current.
Can we create a test release going back to FRR 6.0.2?

I tried creating an ISO with package frr set to frr-6.0.2.
The frr package was created succesfully but during the creation of the ISO image i get a dependancy error on vyatta-cfg-quagga.
vyos-world : Depends: vyatta-cfg-quagga but it is not going to be installed

kroy added a comment.Feb 13 2019, 12:38 AM

@Merijn make sure you git checkout currenton everything.

@kroy everything is at current, except 'frr' because then i get 7.1dev and i would like 6.0.2 to test if this solved it.
I used debian/master branch from FRR.

kroy added a comment.Feb 13 2019, 1:26 PM

Strange. I’ve seen that error a lot. Every time it’s been when I’ve forgotten to checkout current after cloning the repo.

i tried doing complete reinstalls and i can now confirm this bug as well.

Would a permanent fix not be to just get Zebra (FRR) handle the static routes by enabling the module Staticd in /etc/frr/daemons by adding staticd = yes at the buttom of the file ?
It presists after reboots and fixes the issue entirely
Then routing is handled somewhat exclusive in FRR as well for a more unified and easier troubleshoot ?

This was mentioned earlier by Primoz (thanks by the way)

@Maltahl for me it was not fixed with that addition, and i read above that others had this as well.

@Maltahl just re-tested this, with the staticd=yes added, and a reboot done.
When i add two static routes i would expect the /24 route to work because it is more specific. But is does not and show ip route shows only the /23 blackhole.

set protocols static route 10.0.0.0/23 blackhole
set protocols static route 10.0.1.0/24 next-hop 192.168.1.1

@kroy by using chroot and trying to install the vyatta-cfg-quagga package i found out what is causing my build iso error:

The following packages have unmet dependencies:
vyatta-cfg-quagga : Depends: frr (>= 6.1) but 6.0.2-2 is to be installed

I tested the current release and the issue still exists.
After adding staticd=yes on a reboot everything seemed to work.
More specific routes work even when a larger blackhole route is present.
However adding a new blackhole route while more specific routes exist (and work fine) stops them from working. The new blackhole routes get loaded and supress the more specific routes.
Removed a blackhole route allow other routes to work even when they are not part of the blackhole route.

So i have

10.10.10.0/24 next-hop 1.1.1.1 
10.10.0.0/16 blackhole
10.20.10.0/24 next-hop 2.2.2.2
10.20.0.0/16 blackhole

Both /24's do not work.
Now I remove the 10.20.0.0/16 blackhole route and after a reboot the 10.10.10.0/24 route starts working, and obviously the 10.20.10.0/24 also works because the overlapping blackhole is removed.
Something to do with the order of routes added to Zebra/Static, or with the number of routes in the table maybe.

Both 10.10.10.0/24 and 10.10.0.0/16 show:

Known via "static", distance 1, metric 0, best

The router is running BGP.
show ip bgp summary

BGP router identifier ID, local AS number AS vrf-id 0
BGP table version 1175577
RIB entries 1348049, using 206 MiB of memory
Peers 18, using 372 KiB of memory
Peer groups 4, using 256 bytes of memory

show ipv6 bgp summary

BGP router identifier ID, local AS number AS vrf-id 0
BGP table version 203449
RIB entries 340246, using 52 MiB of memory
Peers 18, using 372 KiB of memory
Peer groups 4, using 256 bytes of memory

show ip route summary

Route Source Routes FIB (vrf default)
connected 1 1
static 27 27
ebgp 3021 3021
ibgp 734364 734362
Totals 737413 737411

syncer changed the task status from Confirmed to In progress.Feb 28 2019, 4:35 PM
m-asama added a subscriber: m-asama.Mar 4 2019, 5:56 AM

FRR 7.0 was released three days ago.

https://github.com/FRRouting/frr/releases/tag/frr-7.0

I tried VyOS 1.2.0 + FRR 7.0 release. The above bug was not reproduced.

When comparing "frr-7.1-dev" and "frr-7.0", the following patch was not applied to "frr-7.0".

https://github.com/FRRouting/frr/commit/c45fb58dd310ba05ca9e1f2da05b37f79b7aa16c

Next, I tried "frr-7.1-dev" which reverted this patch and it was not reproduced, too.

In order to build FRR 7.0 release I had to raise the version of libyang to 0.16.74 and to build libyang 0.16.74 it was necessary to raise the swig to 3.0.12.

Merijn added a comment.Mar 6 2019, 2:07 PM

Do you have an iso to test? I tried latest rolling and also my own iso built from current and i continue to see this issue.
It makes transitioning to 1.2.0 impossible at this moment. Still at 1.1.8 on the routers.

I made it.

http://enog.jp/~masakazu/vyos/1.2.0/vyos-999.201903071414-amd64.iso

If you want to make it yourself, the following patch may be helpful.

https://github.com/m-asama/vyos-build/commit/2b5e2dc1df2c0554a92de0d764ccf2ca42a7a0ad
https://github.com/m-asama/vyatta-cfg/commit/982b8ed72d23dc52e64e24bbe94fabc947bc8b67

There is one point of attention. Before doing make iso you need to copy libyang_0.16.74_amd64.deb to vyos-build/packages.
I think that the following will be helpful for making libyang_0.16.74_amd64.deb.

https://github.com/m-asama/vyos-build/commit/2b5e2dc1df2c0554a92de0d764ccf2ca42a7a0ad#diff-ebacf6f6ae4ee68078bb16454b23247dR165

With 1.2.0-H4 this issue seems to be fixed on my router.

kroy added a comment.Mar 17 2019, 12:27 AM

Yep. Can confirm issue is fixed with the latest hot fix.

syncer closed this task as Resolved.Mar 17 2019, 3:46 AM
syncer added a project: VyOS 1.3 Equuleus.
syncer moved this task from In Progress to Finished on the VyOS 1.2 Crux (VyOS 1.2.1) board.
syncer moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus board.