Page MenuHomeVyOS Platform

VNI not set on VRF after reboot
Closed, ResolvedPublicBUG

Description

Running 1.4-rolling-202403240523-amd64 in new virtual machine, I added minimal vxlan and vrf configurations.

interfaces {
    bridge br1 {
        enable-vlan
        member {
            interface vxlan1 {
            }
        }
    }
    ethernet eth0 {
        address 10.0.0.1/24
        hw-id bc:24:11:d9:75:1f
    }
    loopback lo {
    }
    vxlan vxlan1 {
        parameters {
            external
            nolearning
        }
        source-address 10.0.0.1
    }
}
service {
    # post-install defaults
}
system {
    # post-install defaults
}
vrf {
    name test {
        table "100"
        vni "10000"
    }
}

Upon reboot, the following error is recorded.

Mar 24 14:16:37 sudo[1972]:     root : PWD=/ ; USER=root ; COMMAND=/usr/bin/sh -c '/usr/sbin/vyshim /usr/libexec/vyos/conf_mode/vrf_vni.py test'
Mar 24 14:16:37 sudo[1972]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=0)
Mar 24 14:16:37 vyos-configd[731]: Received message: {"type": "node", "data": "/usr/libexec/vyos/conf_mode/vrf_vni.pytest"}
Mar 24 14:16:37 vyos-configd[731]: [Errno 1] Exiting: failed to connect to any daemons.
Mar 24 14:16:37 vyos-configd[731]: Sending response 4
Mar 24 14:16:37 netplugd[1006]: br1: state INNING pid 1963 exited status 256
Mar 24 14:16:37 python3[1977]: Report time:      2024-03-24 14:16:37
Mar 24 14:16:37 python3[1977]: Image version:    VyOS 1.4-rolling-202403240523
Mar 24 14:16:37 python3[1977]: Release train:    sagitta
Mar 24 14:16:37 python3[1977]: Built by:         root@b92449f03c88
Mar 24 14:16:37 python3[1977]: Built on:         Sun 24 Mar 2024 05:23 UTC
Mar 24 14:16:37 python3[1977]: Build UUID:       167ca0d2-0a25-4d9e-8ce1-0e0971b4c7f7
Mar 24 14:16:37 python3[1977]: Build commit ID:  9560a65ddd5a4e
Mar 24 14:16:37 Architecture[1977]:     x86_64
Mar 24 14:16:37 python3[1977]: Boot via:         installed image
Mar 24 14:16:37 python3[1977]: System type:      KVM guest
Mar 24 14:16:37 python3[1977]: Hardware vendor:  QEMU
Mar 24 14:16:37 python3[1977]: Hardware model:   Standard PC (Q35 + ICH9, 2009)
Mar 24 14:16:37 python3[1977]: Hardware S/N:
Mar 24 14:16:37 python3[1977]: Hardware UUID:    b629a224-45e4-40a2-8077-55d7af9b5e23
Mar 24 14:16:37 python3[1977]: Traceback (most recent call last):
Mar 24 14:16:37 python3[1977]:   File "/usr/libexec/vyos/conf_mode/vrf_vni.py", line 101, in <module>
Mar 24 14:16:37 python3[1977]:     apply(c)
Mar 24 14:16:37 python3[1977]:   File "/usr/libexec/vyos/conf_mode/vrf_vni.py", line 85, in apply
Mar 24 14:16:37 python3[1977]:     frr_cfg.load_configuration(frr_daemon)
Mar 24 14:16:37 python3[1977]:   File "/usr/lib/python3/dist-packages/vyos/frr.py", line 437, in load_configuration
Mar 24 14:16:37 python3[1977]:     self.imported_config = get_configuration(daemon=daemon)
Mar 24 14:16:37 python3[1977]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Mar 24 14:16:37 python3[1977]:   File "/usr/lib/python3/dist-packages/vyos/frr.py", line 151, in get_configuration
Mar 24 14:16:37 python3[1977]:     raise OSError(code, output)
Mar 24 14:16:37 PermissionError[1977]: [Errno 1] Exiting: failed to connect to any daemons.
Mar 24 14:16:37 sudo[1972]: pam_unix(sudo:session): session closed for user root

The resulting configuration drops the vrf vni.

vrf {
    name test {
        table 100
    }
}

I am able to rollback-soft 0 and commit to restore the vni.

See

Details

Difficulty level
Normal (likely a few hours)
Version
1.4-rolling-202403240523-amd64
Why the issue appeared?
Issues in third-party code
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

Viacheslav triaged this task as Normal priority.Mar 25 2024, 8:52 AM
Viacheslav raised the priority of this task from Normal to High.Sat, Mar 30, 6:04 PM

@kevinrausch Thank you for the report, next time it is better to use set of the commands to reproduce

To reproduce

set interfaces bridge br1 enable-vlan
set interfaces bridge br1 member interface vxlan1
set interfaces ethernet eth1 address '10.0.0.1/30'
set interfaces ethernet eth1 description 'WAN'
set interfaces vxlan vxlan1 parameters external
set interfaces vxlan vxlan1 parameters nolearning
set interfaces vxlan vxlan1 source-address '10.0.0.1'
set vrf name test table '100'
set vrf name test vni '10000'
commit

Traceback:

Traceback (most recent call last):
  File "/usr/libexec/vyos/conf_mode/vrf_vni.py", line 101, in <module>
    apply(c)
  File "/usr/libexec/vyos/conf_mode/vrf_vni.py", line 85, in apply
    frr_cfg.load_configuration(frr_daemon)
  File "/usr/lib/python3/dist-packages/vyos/frr.py", line 438, in load_configuration
    self.imported_config = get_configuration(daemon=daemon)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/vyos/frr.py", line 153, in get_configuration
    raise OSError(code, output)
PermissionError: [Errno 1] Exiting: failed to connect to any daemons.



[[vrf name test vni]] failed


vyos@r4# compare 
[vrf name test]
+ vni "10000"

[edit]
vyos@r4# commit
[edit]
vyos@r4#

Tested on VyOS 1.5-rolling-202403310021

Viacheslav changed the subtype of this task from "Task" to "Bug".

The current priorities:

vyos@r4:~$ /usr/libexec/vyos/priority.py | match "vrf|bri|vxlan"
        11  vrf.py                              ['vrf']
       310  interfaces_bridge.py                ['interfaces', 'bridge']
       460  interfaces_vxlan.py                 ['interfaces', 'vxlan']
       481  protocols_static.py                 ['vrf', 'name', 'protocols', 'static']
       611  protocols_isis.py                   ['vrf', 'name', 'protocols', 'isis']
       621  protocols_ospf.py                   ['vrf', 'name', 'protocols', 'ospf']
       621  protocols_ospfv3.py                 ['vrf', 'name', 'protocols', 'ospfv3']
       821  protocols_bgp.py                    ['vrf', 'name', 'protocols', 'bgp']
       821  protocols_eigrp.py                  ['vrf', 'name', 'protocols', 'eigrp']
       822  vrf_vni.py                          ['vrf', 'name', 'vni']
vyos@r4:~$

This actually also happens without a reboot on my test system.

Apr 01 15:20:52 kernel: br1: port 1(vxlan1) entered blocking state
Apr 01 15:20:52 kernel: br1: port 1(vxlan1) entered disabled state
Apr 01 15:20:52 kernel: vxlan1: entered allmulticast mode
Apr 01 15:20:52 kernel: vxlan1: entered promiscuous mode
Apr 01 15:20:52 ZEBRA[1324]: Received signal 11 at 1711977652 (si_addr 0x0, PC 0x55631654b625); aborting...
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: showing active allocations in memory group libfrr
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Buffer                        :     17 *         24
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Host config                   :      6 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Command Tokens                :   4945 *         72
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Command Token Text            :   3524 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Command Token Help            :   3524 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Command Argument Name         :   1180 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  RCU thread                    :     17 *        128
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  RCU sequence barrier          :      1 *         32
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Scripting                     :     15 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  FRR POSIX Thread              :     34 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  POSIX sync primitives         :     34 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Graph                         :     31 *          8
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Graph Node                    :   5776 *         32
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Hash                          :    154 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Hash Bucket                   :   1033 *         32
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Hash Index                    :     77 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Hook entry                    :      1 *         48
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Interface                     :     12 *        280
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Connected                     :      9 *         48
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Link List                     :     94 *         40
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Link Node                     :    169 *         24
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Temporary memory              :      1 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Bitfield memory               :      1 *       2052
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Nexthop                       :     10 *        152
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  NetNS Context                 :      2 * (variably sized)
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6f) [0x7f39e083629f]
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  NetNS Name                    :      1 *         18
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Northbound Node               :    693 *       1192
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Northbound Configuration      :      2 *         24
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Northbound Configuration Entry:      2 *       1032
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Prefix                        :      9 *         56
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Privilege information         :      4 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Stream                        :     30 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Stream FIFO                   :     31 *         64
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Route table                   :     33 *         56
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Route node                    :     49 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Thread                        :     67 *        160
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Thread master                 :     72 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Thread Poll Info              :     36 *       8192
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Thread stats                  :     63 *        112
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Typed-hash bucket             :     36 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Typed-heap array              :      1 *        576
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Vector                        :  11615 *         24
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Vector index                  :  11615 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  VRF                           :      2 *        216
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  VRF bit-map                   :     10 *          8
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  VTY                           :      2 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  VTY server                    :      2 *         32
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Work queue                    :      3 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Work queue item               :      1 *         24
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Work queue name string        :      2 * (variably sized)
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f39e08364a5]
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  YANG module                   :      8 *         48
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  log thread-local buffer       :     18 *      24608
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: showing active allocations in memory group logging subsystem
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  syslog target                 :      1 *         56
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: showing active allocations in memory group Label Manager
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Label Manager Chunk           :      2 *         20
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: showing active allocations in memory group Table Manager
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Table Manager Context         :      1 *         16
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: showing active allocations in memory group SRv6 Manager
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: showing active allocations in memory group zebra
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra Interface Information   :     12 *        608
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Intf desc                     :      1 *          4
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra Netlink buffers         :      5 * (variably sized)
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Router Advertisement Prefix   :      1 *         80
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra DPlane Ctx              :      1 *       2544
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra DPlane Provider         :      1 *        248
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  DPlane NSes                   :      1 *         48
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Nexthop Group Entry           :     10 *        144
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Nexthop Group Connected       :     10 *         40
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra Name Space              :      1 *        480
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  ZAPI Opaque Information       :     12 *         32
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  PTM BFD process reg table     :      6 *         32
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Route Entry                   :     19 *        152
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  RIB destination               :     18 *         88
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Rib update context object     :      1 *          8
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  RIB table info                :      8 *         24
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra VRF table               :      8 *         56
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xf65f1) [0x7f39e086b5f1]
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  ZEBRA VRF                     :      2 *       5072
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  MH global info                :      1 *        128
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra neigh table             :      1 *          8
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  Zebra neigh entry             :      3 *         80
Apr 01 15:20:52 frrinit.sh[1324]: core_handler: memstats:  ZClients                      :     15 *       3584
Apr 01 15:20:52 ZEBRA[1324]: /lib/x86_64-linux-gnu/libc.so.6(+0x3c050) [0x7f39e0579050]
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/frr/zebra(zebra_if_dplane_result+0x1665) [0x55631654b625]
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/frr/zebra(+0xf5e69) [0x5563165a7e69]
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(event_call+0x81) [0x7f39e087da11]
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xc0) [0x7f39e082e010]
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/frr/zebra(main+0x3be) [0x55631653c6de]
Apr 01 15:20:52 ZEBRA[1324]: /lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f39e056424a]
Apr 01 15:20:52 ZEBRA[1324]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f39e0564305]
Apr 01 15:20:52 ZEBRA[1324]: /usr/lib/frr/zebra(_start+0x21) [0x55631653d681]
Apr 01 15:20:52 ZEBRA[1324]: in thread rib_process_dplane_results scheduled from ../zebra/zebra_rib.c:4960 rib_dplane_results()
Apr 01 15:20:52 watchfrr[1305]: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF
Apr 01 15:20:52 watchfrr[1305]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Apr 01 15:20:52 watchfrr[1305]: [HD38Q-0HBRT][EC 268435457] zebra state -> down : unexpected read error: Connection reset by peer

The issue seems to be in zebra/interface.c:

1790x static void
1791x interface_bridge_vxlan_vlan_vni_map_update(struct zebra_dplane_ctx *ctx,
1792x                                            struct interface *ifp)
1793x {
1794x         const struct zebra_vxlan_vni_array *vniarray =
1795x                 dplane_ctx_get_ifp_vxlan_vni_array(ctx);
1796x         struct zebra_vxlan_vni vni_start, vni_end;
1797x         struct hash *vni_table = NULL;
1798x         struct zebra_vxlan_vni vni, *vnip;
1799x         vni_t vni_id;
1800x         vlanid_t vid;
1801x         int i;
1802x
1803x         memset(&vni_start, 0, sizeof(vni_start));
1804x         memset(&vni_end, 0, sizeof(vni_end));
1805x
1806tqqqqqqq> for (i = 0; i < vniarray->count; i++) {
1807x                 uint16_t flags = vniarray->vnis[i].flags;
(gdb) print vniarray
$2 = (const struct zebra_vxlan_vni_array *) 0x0

One can not iterator over a NULLptr

c-po changed Difficulty level from Unknown (require assessment) to Normal (likely a few hours).Mon, Apr 1, 8:13 PM
c-po changed Why the issue appeared? from Will be filled on close to Issues in third-party code.
c-po changed Is it a breaking change? from Unspecified (possibly destroys the router) to Perfectly compatible.
c-po changed Issue type from Unspecified (please specify) to Bug (incorrect behavior).

Upstream fix merged https://github.com/FRRouting/frr/pull/15649

Packages got rebuild and fix will be in next rolling release.

c-po moved this task from Need Triage to Finished on the VyOS 1.4 Sagitta (1.4.0-epa3) board.
c-po moved this task from Need Triage to Finished on the VyOS 1.5 Circinus board.