Page MenuHomeVyOS Platform

RPKI doesn't boot properly
Open, Requires assessmentPublicBUG

Description

Have a config like:

rpki {
     cache routinator {
         address 192.168.100.90
         port 3323
     }
 }

and a route-map like:

route-map ebgp-transit-rpki {

    rule 10 {
        action deny
        match {
            rpki invalid
        }
    }
    rule 20 {
        action permit
        match {
            rpki notfound
        }
        set {
            local-preference 20
        }
    }
    rule 30 {
        action permit
        match {
            rpki valid
        }
        set {
            local-preference 100
        }
    }
}

after reboot this setup (bgp session having this import route-map set comes up with 0 prefixes) doesn't work till I enter vtysh and execute rpki stop, rpki start and clear bgp *

Details

Difficulty level
Unknown (require assessment)
Version
1.3-rolling-202002161917 (but this is going one for quite some time)
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Related Objects

Event Timeline

I can confirm this bug also with VyOS 1.2.4

bgp-rtor-01 must receive prefixes
bgp-rtor-02 advertised prefixes

After rebooting "bgp-rtor-01" there are no connection to rpki server

vyos@bgp-rtor-01:~$ show ip bgp sum
...
Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
192.168.33.2    4     203115       4       3        0    0    0 00:00:14            0

vyos@bgp-rtor-01:~$ sudo vtysh -c "show rpki cache-connection"
No connection to RPKI cache server.
vyos@bgp-rtor-01:~$

We check that we will export prefixes.

vyos@bgp-rtor-02:~$ show ip bgp neighbors 192.168.33.1 advertised-routes 
BGP table version is 8, local router ID is 192.168.33.2, vrf id 0
Default local pref 100, local AS 203115
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.64.0.0/24    0.0.0.0                  0         32768 i
*> 100.64.1.0/24    0.0.0.0                  0         32768 i
*> 100.64.2.0/24    0.0.0.0                  0         32768 i
*> 100.64.3.0/24    0.0.0.0                  0         32768 i
...

Total number of prefixes 8
vyos@bgp-rtor-02:~$

After rebooting bgp-rtor-01, the dump (on side routinator server) does not show any attempts to connect to the routinator server.

root@ponctrl:/home/sever# tcpdump -ntti ens20 port 3323
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens20, link-type EN10MB (Ethernet), capture size 262144 bytes

looks for my like an frr bug. Has someone contacted upstream?

We saw something similar to this, but it seems like FRR eventually connected to RTRR. I think it has a timeout parameter — is that how often (slowly) it tries to re-establish?

While testing T1874 the procedure we followed was:

  1. install 1.2.4
  2. configure
  3. observe bgpd crash after ~5 minutes
  4. upgrade to 1.2.5
  5. reboot
  6. check RPKI RTRR connection had established
  7. check BGP session had established
  8. observe no crash in bgpd
  9. celebrate

And I can confirm that after boot-up, FRR was indeed connected to an RTRR server:

vyos@test.faelix.net:~$ sudo vtysh -c "show rpki cache-connection"
Connected to group 1
rpki tcp cache 46.227.201.12 3323 pref 1

Checking from the other end, on the RTRR server:

root@slm:~# ss -an | grep 46.227.201.78
tcp                ESTAB               0                    0                                                                   46.227.201.12:3323                                                      46.227.201.78:44676

I would say that this bug might be fixed?

I tried this today with 1.3-rolling-202004180117 ...

after reboot:

$ show rpki cache-connection
No connection to RPKI cache server.

again:
vRouter1# rpki stop
vRouter1# rpki start
vRouter1# show rpki cache-connection
Connected to group 1
rpki tcp cache 192.168.100.90 3323 pref 1
vRouter1# clear bgp *

solves everything.