Maniphest T2044

RPKI doesn't boot properly
Closed, ResolvedPublicBUG
Actions

Assigned To

Authored By

	primoz
	Feb 16 2020, 9:06 PM

Description

Have a config like:

rpki {
     cache routinator {
         address 192.168.100.90
         port 3323
     }
 }

and a route-map like:

route-map ebgp-transit-rpki {

    rule 10 {
        action deny
        match {
            rpki invalid
        }
    }
    rule 20 {
        action permit
        match {
            rpki notfound
        }
        set {
            local-preference 20
        }
    }
    rule 30 {
        action permit
        match {
            rpki valid
        }
        set {
            local-preference 100
        }
    }
}

after reboot this setup (bgp session having this import route-map set comes up with 0 prefixes) doesn't work till I enter vtysh and execute rpki stop, rpki start and clear bgp *

Details

Difficulty level: Unknown (require assessment)
Version: 1.3-rolling-202002161917 (but this is going one for quite some time)
Why the issue appeared?: Will be filled on close
Is it a breaking change?: Unspecified (possibly destroys the router)
Issue type: Bug (incorrect behavior)

Related Objects
Search...

Status	Subtype	Assigned	Task
Open	BUG	dmbaturin	T5938 Migration fail root task for 1.4-rc
Resolved	BUG	c-po	T6004 RPKI is not configured
Resolved	BUG	c-po	T2044 RPKI doesn't boot properly

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

pasik added a subscriber: pasik.Feb 17 2020, 8:13 AM

I can confirm this bug also with VyOS 1.2.4

bgp-rtor-01 must receive prefixes
bgp-rtor-02 advertised prefixes

After rebooting "bgp-rtor-01" there are no connection to rpki server

vyos@bgp-rtor-01:~$ show ip bgp sum
...
Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
192.168.33.2    4     203115       4       3        0    0    0 00:00:14            0

vyos@bgp-rtor-01:~$ sudo vtysh -c "show rpki cache-connection"
No connection to RPKI cache server.
vyos@bgp-rtor-01:~$

We check that we will export prefixes.

vyos@bgp-rtor-02:~$ show ip bgp neighbors 192.168.33.1 advertised-routes 
BGP table version is 8, local router ID is 192.168.33.2, vrf id 0
Default local pref 100, local AS 203115
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 100.64.0.0/24    0.0.0.0                  0         32768 i
*> 100.64.1.0/24    0.0.0.0                  0         32768 i
*> 100.64.2.0/24    0.0.0.0                  0         32768 i
*> 100.64.3.0/24    0.0.0.0                  0         32768 i
...

Total number of prefixes 8
vyos@bgp-rtor-02:~$

After rebooting bgp-rtor-01, the dump (on side routinator server) does not show any attempts to connect to the routinator server.

root@ponctrl:/home/sever# tcpdump -ntti ens20 port 3323
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens20, link-type EN10MB (Ethernet), capture size 262144 bytes

looks for my like an frr bug. Has someone contacted upstream?

We saw something similar to this, but it seems like FRR eventually connected to RTRR. I think it has a timeout parameter — is that how often (slowly) it tries to re-establish?

While testing T1874 the procedure we followed was:

install 1.2.4
configure
observe bgpd crash after ~5 minutes
upgrade to 1.2.5
reboot
check RPKI RTRR connection had established
check BGP session had established
observe no crash in bgpd
celebrate

And I can confirm that after boot-up, FRR was indeed connected to an RTRR server:

[email protected]:~$ sudo vtysh -c "show rpki cache-connection"
Connected to group 1
rpki tcp cache 46.227.201.12 3323 pref 1

Checking from the other end, on the RTRR server:

root@slm:~# ss -an | grep 46.227.201.78
tcp                ESTAB               0                    0                                                                   46.227.201.12:3323                                                      46.227.201.78:44676

I would say that this bug might be fixed?

I tried this today with 1.3-rolling-202004180117 ...

after reboot:

$ show rpki cache-connection
No connection to RPKI cache server.

again:
vRouter1# rpki stop
vRouter1# rpki start
vRouter1# show rpki cache-connection
Connected to group 1
rpki tcp cache 192.168.100.90 3323 pref 1
vRouter1# clear bgp *

solves everything.

This bug got fixed with: https://phabricator.vyos.net/T3227

@primoz, I have exactly the same issue with "1.4-rolling-202103011828 (sagitta)"

HON added a subscriber: HON.Mar 5 2021, 8:26 AM

mpueschel added a subscriber: mpueschel.Mar 9 2021, 10:55 PM

erkin set Issue type to Bug (incorrect behavior).Aug 31 2021, 5:39 PM

syncer edited projects, added VyOS 1.3 Equuleus (1.3.0); removed VyOS 1.3 Equuleus.Nov 6 2021, 11:25 AM

Still reproducible VyOS 1.3-beta-202111150443
After reboot

No imported routes:

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
192.168.122.11  4     203115         6         5        0    0    0 00:01:52            0        0

re-start rpki

r4-epa2# 
r4-epa2# show rpki cache-connection 
No connection to RPKI cache server.
r4-epa2# 

r4-epa2# rpki stop 
r4-epa2# rpki start
r4-epa2# exit

Reset bgp peer

vyos@r4-epa2:~$ reset ip bgp all


vyos@r4-epa2:~$ show ip bgp sum

IPv4 Unicast Summary:
BGP router identifier 192.168.122.14, local AS number 65001 vrf-id 0
BGP table version 8
RIB entries 15, using 2880 bytes of memory
Peers 1, using 21 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt
192.168.122.11  4     203115        10        10        0    0    0 00:00:02            8        0

jmbwell added a subscriber: jmbwell.Feb 10 2022, 4:14 AM

I'm able to reproduce this with 1.4, using the new config structure:

rpki {
    cache 10.3.96.4 {
        port 8082
        preference 1
    }
}

The same procedure restores operation:

$ vtysh
dal-1# rpki stop
dal-1# rpki start
dal-1# clear bgp *
dal-1# exit

fernando added a subscriber: fernando.Feb 22 2022, 5:02 PM

Has any progress on this been made? I am still having this issue on 1.4-rolling-202205250217.

currently the only fix I have found is to run the following commands:

vtysh -c "rpki stop"
vtysh -c "rpki start"

aalmenar added a subscriber: aalmenar.Jun 15 2022, 12:36 PM

Hi,

Same issue on VyOS 1.4-rolling-202208240217

And when you set the rpki ips you have wrong description on the options, instead of the "rpki server ip" you have "NTP server"

router# set protocols rpki cache ?
Possible completions:
> <x.x.x.x> IP address of NTP server
> <h:h:h:h:h:h:h:h> IPv6 address of NTP server
> <hostname> Fully qualified domain name of NTP server

In T2044#129750, @egoistdream wrote:

Hi,

Same issue on VyOS 1.4-rolling-202208240217

And when you set the rpki ips you have wrong description on the options, instead of the "rpki server ip" you have "NTP server"

router# set protocols rpki cache ?
Possible completions:
> <x.x.x.x> IP address of NTP server
> <h:h:h:h:h:h:h:h> IPv6 address of NTP server
> <hostname> Fully qualified domain name of NTP server

I created a separate task for descriptions T4654

syncer edited projects, added VyOS 1.3 Equuleus (1.3.3); removed VyOS 1.3 Equuleus (1.3.0).Aug 29 2022, 7:05 AM

Hi,
same issue on VyOS 1.4-rolling-202212090319

After each reboot I get this:

show rpki cache-connection
No connection to RPKI cache server.

To regain a working connection I have to “touch” the rpki configuration (eg. changing the polling period to a random number). After commiting that change all starts to work as expected:

show rpki cache-connection
Connected to group 1
rpki tcp cache 10.42.0.3 8082 pref 1 (connected)

Viacheslav added a project: VyOS 1.4 Sagitta.Dec 15 2022, 8:31 AM

Chiming in here as a 'me too', on vyos-1.4-rolling-202305300317

syncer edited projects, added VyOS 1.3 Equuleus (1.3.4); removed VyOS 1.3 Equuleus (1.3.3).Jul 12 2023, 9:43 PM

Running into this as well on: 1.4-rolling-202307260317

Our workaround for the moment is just kicking RPKI with: vtysh -c 'rpki reset'
This could potentially be added to /config/scripts/vyos-postconfig-bootup.script but we haven't validated that yet.

[update]
Tested adding rpki reset to the bootup script as detailed above. This is a viable workaround, however, if you are using route-maps to classify routes based on RPKI status, they will need to be re-evaluated as well once RPKI is established. We found that (while ugly) putting sleep before a route refresh works well:

vtysh -c "rpki reset" && sleep 5 && vtysh -c "clear bgp * soft in"

Latest rolling uses FRR 9.0. - could you re-test it please?

@c-po Tried with latest rolling 1.4-rolling-202308060317, rpki doesn't start automatically, one must do:

$ vtysh
$ rpki start

Then rpki starts validating prefixes.

c-po changed the task status from Open to In progress.Aug 7 2023, 9:09 PM

c-po claimed this task.

@aalmenar could you test this patch?

diff i/usr/libexec/vyos/conf_mode/protocols_rpki.py w/usr/libexec/vyos/conf_mode/protocols_rpki.py
index 035b7db05..e05103aab 100755
--- i/usr/libexec/vyos/conf_mode/protocols_rpki.py
+++ w/usr/libexec/vyos/conf_mode/protocols_rpki.py
@@ -22,6 +22,7 @@ from vyos.config import Config
 from vyos.configdict import dict_merge
 from vyos.template import render_to_string
 from vyos.utils.dict import dict_search
+from vyos.utils.process import cmd
 from vyos.xml import defaults
 from vyos import ConfigError
 from vyos import frr
@@ -95,6 +96,11 @@ def apply(rpki):
         frr_cfg.add_before(frr.default_add_before, rpki['new_frr_config'])

     frr_cfg.commit_configuration(bgp_daemon)
+
+    start_stop_cmd = 'start'
+    if not rpki: start_stop_cmd = 'stop'
+    cmd(f'vtysh -c "rpki {start_stop_cmd}"')
+
     return None

 if __name__ == '__main__':

aalmenar added a comment.Aug 7 2023, 9:56 PM

This comment was removed by aalmenar.

@c-po

Nope, now i had to do

vtysh
rpki stop
rpki start

for it to work again....

Hi,

I was able to fix by adding the following code in /config/scripts/vyos-postconfig-bootup.script you can edit and save by running:

sudo nano /config/scripts/vyos-postconfig-bootup.script

and add:

#!/bin/vbash
vtysh -c "rpki start"

exit

But I still hope that il will be fixed on the official release.

Regards

syncer triaged this task as Normal priority.Aug 12 2023, 10:09 PM

@egoistdream

interesting, as the above diff actually does the same but a bit earlier in the boot process

syncer edited projects, added VyOS 1.3 Equuleus (1.3.5); removed VyOS 1.3 Equuleus (1.3.4).Aug 25 2023, 9:30 PM

Could https://vyos.dev/T2044 be related to the failed nightly build from last night?

https://github.com/vyos/vyos-rolling-nightly-builds/actions/runs/6179287424/job/16773987622#step:10:28483

DEBUG - test_route_map (__main__.TestPolicy.test_route_map) ... FAIL

https://github.com/vyos/vyos-rolling-nightly-builds/actions/runs/6179287424/job/16773987622#step:10:28642

DEBUG - test_rpki (__main__.TestProtocolsRPKI.test_rpki) ... ERROR

https://github.com/vyos/vyos-1x/pull/2245

Could the error from latest nightly be due to that rpki module isnt loaded for FRR/bgp?

It seems like the commit from the other day which removed a duplicated configs.chroot regarding frr from vyos-build perhaps wasnt properly synced to the remaining daemons-file in vyos-1x?

https://github.com/vyos/vyos-build/commit/a9a1ca3cbb0951a37de286fffb2554103b561846

Removed config in vyos-build (data/live-build-config/hooks/live/30-frr-configs.chroot):

zebra=yes
bgpd=yes
ospfd=yes
ospf6d=yes
ripd=yes
ripngd=yes
isisd=yes
pimd=no
pim6d=yes
ldpd=yes
nhrpd=no
eigrpd=yes
babeld=yes
sharpd=no
pbrd=no
bfdd=yes
staticd=yes

vtysh_enable=yes

zebra_options="-s 90000000 --daemon -A 127.0.0.1 -M snmp"
bgpd_options="--daemon -A 127.0.0.1 -M snmp -M rpki -M bmp"
ospfd_options="--daemon -A 127.0.0.1 -M snmp"
ospf6d_options="--daemon -A ::1 -M snmp"
ripd_options="--daemon -A 127.0.0.1 -M snmp"
ripngd_options="--daemon -A ::1"
isisd_options="--daemon -A 127.0.0.1 -M snmp"
pimd_options="--daemon -A 127.0.0.1"
pim6d_options=""--daemon -A ::1"
ldpd_options="--daemon -A 127.0.0.1"
nhrpd_options="--daemon -A 127.0.0.1"
mgmtd_options=" --daemon -A 127.0.0.1"
eigrpd_options="--daemon -A 127.0.0.1"
babeld_options="--daemon -A 127.0.0.1"
sharpd_options="--daemon -A 127.0.0.1"
pbrd_options="--daemon -A 127.0.0.1"
staticd_options="--daemon -A 127.0.0.1"
bfdd_options="--daemon -A 127.0.0.1"

watchfrr_enable=no
valgrind_enable=no

Remaining config in vyos-1x (data/templates/frr/daemons.frr.tmpl):

zebra=yes
bgpd=yes
ospfd=yes
ospf6d=yes
ripd=yes
ripngd=yes
isisd=yes
pimd=no
pim6d=yes
ldpd=yes
nhrpd=no
eigrpd=yes
babeld=yes
sharpd=no
pbrd=no
bfdd=yes
staticd=yes

vtysh_enable=yes
zebra_options="  -s 90000000 --daemon -A 127.0.0.1
{%- if irdp is defined %} -M irdp{% endif -%}
{%- if snmp is defined and snmp.zebra is defined %} -M snmp{% endif -%}
"
bgpd_options="   --daemon -A 127.0.0.1
{%- if bmp is defined %} -M bmp{% endif -%}
{%- if snmp is defined and snmp.bgpd is defined %} -M snmp{% endif -%}
"
ospfd_options="  --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.ospfd is defined %} -M snmp{% endif -%}
"
ospf6d_options=" --daemon -A ::1
{%- if snmp is defined and snmp.ospf6d is defined %} -M snmp{% endif -%}
"
ripd_options="   --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.ripd is defined %} -M snmp{% endif -%}
"
ripngd_options=" --daemon -A ::1"
isisd_options="  --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.isisd is defined %} -M snmp{% endif -%}
"
pimd_options="  --daemon -A 127.0.0.1"
pim6d_options=" --daemon -A ::1"
ldpd_options="  --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.ldpd is defined %} -M snmp{% endif -%}
"
mgmtd_options=" --daemon -A 127.0.0.1"
nhrpd_options="  --daemon -A 127.0.0.1"
eigrpd_options="  --daemon -A 127.0.0.1"
babeld_options="  --daemon -A 127.0.0.1"
sharpd_options="  --daemon -A 127.0.0.1"
pbrd_options="  --daemon -A 127.0.0.1"
staticd_options="  --daemon -A 127.0.0.1"
bfdd_options="  --daemon -A 127.0.0.1"

watchfrr_enable=no
valgrind_enable=no

Proposed fix for vyos-1x (data/templates/frr/daemons.frr.tmpl):

zebra=yes
bgpd=yes
ospfd=yes
ospf6d=yes
ripd=yes
ripngd=yes
isisd=yes
pimd=no
pim6d=yes
ldpd=yes
nhrpd=no
eigrpd=yes
babeld=yes
sharpd=no
pbrd=no
bfdd=yes
staticd=yes

vtysh_enable=yes
zebra_options="   --daemon -A 127.0.0.1 -s 90000000
{%- if irdp is defined %} -M irdp{% endif -%}
{%- if snmp is defined and snmp.zebra is defined %} -M snmp{% endif -%}
"
bgpd_options="    --daemon -A 127.0.0.1
{%- if bmp is defined %} -M bmp{% endif -%}
{%- if rpki is defined %} -M rpki{% endif -%}
{%- if snmp is defined and snmp.bgpd is defined %} -M snmp{% endif -%}
"
ospfd_options="   --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.ospfd is defined %} -M snmp{% endif -%}
"
ospf6d_options="  --daemon -A ::1
{%- if snmp is defined and snmp.ospf6d is defined %} -M snmp{% endif -%}
"
ripd_options="    --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.ripd is defined %} -M snmp{% endif -%}
"
ripngd_options="  --daemon -A ::1"
isisd_options="   --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.isisd is defined %} -M snmp{% endif -%}
"
pimd_options="    --daemon -A 127.0.0.1"
pim6d_options="   --daemon -A ::1"
ldpd_options="    --daemon -A 127.0.0.1
{%- if snmp is defined and snmp.ldpd is defined %} -M snmp{% endif -%}
"
mgmtd_options="   --daemon -A 127.0.0.1"
nhrpd_options="   --daemon -A 127.0.0.1"
eigrpd_options="  --daemon -A 127.0.0.1"
babeld_options="  --daemon -A 127.0.0.1"
sharpd_options="  --daemon -A 127.0.0.1"
pbrd_options="    --daemon -A 127.0.0.1"
staticd_options=" --daemon -A 127.0.0.1"
bfdd_options="    --daemon -A 127.0.0.1"

watchfrr_enable=no
valgrind_enable=no

Should probably add "-M rpki" permanently to FRR/bgp.

PR created: https://github.com/vyos/vyos-1x/pull/2264

syncer edited projects, added VyOS 1.3 Equuleus (1.3.6); removed VyOS 1.3 Equuleus (1.3.5).Dec 17 2023, 11:38 PM

c-po added a parent task: T6004: RPKI is not configured.Feb 3 2024, 11:51 AM

https://github.com/vyos/vyos-1x/pull/2935