Page MenuHomeVyOS Platform

RPKI doesn't boot properly
Open, Requires assessmentPublicBUG

Description

Have a config like:

rpki {
     cache routinator {
         address 192.168.100.90
         port 3323
     }
 }

and a route-map like:

route-map ebgp-transit-rpki {

    rule 10 {
        action deny
        match {
            rpki invalid
        }
    }
    rule 20 {
        action permit
        match {
            rpki notfound
        }
        set {
            local-preference 20
        }
    }
    rule 30 {
        action permit
        match {
            rpki valid
        }
        set {
            local-preference 100
        }
    }
}

after reboot this setup (bgp session having this import route-map set comes up with 0 prefixes) doesn't work till I enter vtysh and execute rpki stop, rpki start and clear bgp *

Details

Difficulty level
Unknown (require assessment)
Version
1.3-rolling-202002161917 (but this is going one for quite some time)
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Related Objects

Event Timeline

primoz created this task.Feb 16 2020, 9:06 PM
pasik added a subscriber: pasik.Feb 17 2020, 8:13 AM

I can confirm this bug also with VyOS 1.2.4

bgp-rtor-01 must receive prefixes
bgp-rtor-02 advertised prefixes

After rebooting "bgp-rtor-01" there are no connection to rpki server

vyos@bgp-rtor-01:~$ show ip bgp sum
...
Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
192.168.33.2    4     203115       4       3        0    0    0 00:00:14            0

vyos@bgp-rtor-01:~$ sudo vtysh -c "show rpki cache-connection"
No connection to RPKI cache server.
vyos@bgp-rtor-01:~$

We check that we will export prefixes.

vyos@bgp-rtor-02:~$ show ip bgp neighbors 192.168.33.1 advertised-routes 
BGP table version is 8, local router ID is 192.168.33.2, vrf id 0
Default local pref 100, local AS 203115
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.64.0.0/24    0.0.0.0                  0         32768 i
*> 100.64.1.0/24    0.0.0.0                  0         32768 i
*> 100.64.2.0/24    0.0.0.0                  0         32768 i
*> 100.64.3.0/24    0.0.0.0                  0         32768 i
...

Total number of prefixes 8
vyos@bgp-rtor-02:~$

After rebooting bgp-rtor-01, the dump (on side routinator server) does not show any attempts to connect to the routinator server.

root@ponctrl:/home/sever# tcpdump -ntti ens20 port 3323
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens20, link-type EN10MB (Ethernet), capture size 262144 bytes

looks for my like an frr bug. Has someone contacted upstream?

maznu added a subscriber: maznu.Apr 17 2020, 8:20 PM

We saw something similar to this, but it seems like FRR eventually connected to RTRR. I think it has a timeout parameter — is that how often (slowly) it tries to re-establish?

maznu added a comment.EditedApr 18 2020, 7:48 AM

While testing T1874 the procedure we followed was:

  1. install 1.2.4
  2. configure
  3. observe bgpd crash after ~5 minutes
  4. upgrade to 1.2.5
  5. reboot
  6. check RPKI RTRR connection had established
  7. check BGP session had established
  8. observe no crash in bgpd
  9. celebrate

And I can confirm that after boot-up, FRR was indeed connected to an RTRR server:

vyos@test.faelix.net:~$ sudo vtysh -c "show rpki cache-connection"
Connected to group 1
rpki tcp cache 46.227.201.12 3323 pref 1

Checking from the other end, on the RTRR server:

root@slm:~# ss -an | grep 46.227.201.78
tcp                ESTAB               0                    0                                                                   46.227.201.12:3323                                                      46.227.201.78:44676

I would say that this bug might be fixed?

I tried this today with 1.3-rolling-202004180117 ...

after reboot:

$ show rpki cache-connection
No connection to RPKI cache server.

again:
vRouter1# rpki stop
vRouter1# rpki start
vRouter1# show rpki cache-connection
Connected to group 1
rpki tcp cache 192.168.100.90 3323 pref 1
vRouter1# clear bgp *

solves everything.