Page MenuHomeVyOS Platform

VRRP sync-group transition script does not persist after reboot
Closed, ResolvedPublicBUG

Description

Issue
VRRP master script does not execute after reboot following initial VRRP configuration.

Steps to Re-Create

  1. Configure VRRP as below
  2. On the machine designated as MASTER - run restart vrrp
  3. On the machine now designated as MASTER - run show log vrrp to confirm script executed with "Running the command: <script name>"
  4. Repeat as many times to verify as needed
  5. Reboot current MASTER
  6. On the new MASTER run restart vrrp
  7. Go to the new MASTER (this was the machine just rebooted) - script does not execute, not shown in log file
  8. Delete VRRP configuration entirely on both machines OR remove the transition-script line, add again, save - then everything works again.
  9. Reboot -> script stops working again

Other

  1. Script has chmod +x permissions
  2. Script can be run manually from bash and works

Version
vyos-1.3-beta-202112060443-amd64.iso

Configurations
Only difference is priority

#Node A
high-availability {
   vrrp {
     group eth0 {
        interface eth0
        no-preempt
        priority 200
        rfc3768-compatibility
        virtual-address <floating IP>
        vrid 10
       }
   group eth1 {
      interface eth1
      no-preempt
      priority 200
      rfc3768-compatibility
      virtual-address <Internal VIP IP>
      vrid 11
     }
   sync-group MAIN {
    member eth0
    member eth1
    transition-script {
      master /config/scripts/<my script name>.sh
      }
    }
   }
}
#Node B
high-availability {
   vrrp {
     group eth0 {
        interface eth0
        no-preempt
        priority 100
        rfc3768-compatibility
        virtual-address <floating IP>
        vrid 10
       }
   group eth1 {
      interface eth1
      no-preempt
      priority 100
      rfc3768-compatibility
      virtual-address <Internal VIP IP>
      vrid 11
     }
   sync-group MAIN {
    member eth0
    member eth1
    transition-script {
      master /config/scripts/<my script name>.sh
      }
    }
   }
}

Master Script

#!/usr/bin/bash

RESERVED_IP_INSTANCE_ID='sanitized'
API_KEY='sanitized'
VM_INSTANCE_ID='sanitized'

#determine if the floating ip has been assigned
HAS_FLOATING_IP=`curl -s https://api.vultr.com/v2/reserved-ips/${RESERVED_IP_INSTANCE_ID} -X GET -H "Authorization: Bearer ${API_KEY}"`

#debug
logger -s ${HAS_FLOATING_IP}

#check to see if this instance has the floating ip
if [[ "${HAS_FLOATING_IP}" != *"${VM_INSTANCE_ID}"* ]]; then
  logger -s "This machine does not have a IP assigned to this machine already. Let's assign it."

#detach
`curl -s "https://api.vultr.com/v2/reserved-ips/${RESERVED_IP_INSTANCE_ID}/detach" \
-X POST \
-H "Authorization: Bearer ${API_KEY}"`

#attach
`curl -s "https://api.vultr.com/v2/reserved-ips/${RESERVED_IP_INSTANCE_ID}/attach" \
-X POST \
-H "Authorization: Bearer ${API_KEY}" \
-H "Content-Type: application/json" \
-d "{\"instance_id\" : \"${VM_INSTANCE_ID}\"}"`

fi

logger -s "Completed IP failover"
exit

Attaching a screenshot of the script working when keepalived does execute it

Details

Difficulty level
Easy (less than an hour)
Version
vyos-1.3-beta-202112060443
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

I confirm the bug. After rebooting script doesn't work on sync-groups. It's necessary to reload vrrp to start running the script (After rebooting).
More detail:
https://phabricator.vyos.net/T4041

For anyone looking for a work around until it is patched this is my workaround. I am just adding a line to restart the service right after bootup in the “/config/scripts/vyos-postconfig-bootup.script” (more info on this file in the command scripting section - Command Scripting — VyOS 1.4.x (sagitta) documentation)

Thank you for sharing your information.

The displayed error:

[email protected]:~$ cat /var/log/messages | grep keep
Dec  8 09:38:17 LR1 Keepalived[1834]: Command line: '/usr/sbin/keepalived' '--use-file' '/run/keepalived/keepalived.conf' '--pid'
Dec  8 09:38:17 LR1 Keepalived[1834]:               '/run/keepalived/keepalived.pid' '--dont-fork' '--snmp'
Dec  8 09:38:17 LR1 Keepalived[1834]: Opening file '/run/keepalived/keepalived.conf'.
Dec  8 09:38:17 LR1 Keepalived[1834]: NOTICE: setting config option max_auto_priority should result in better keepalived performance
Dec  8 09:38:17 LR1 Keepalived_vrrp[1847]: Opening file '/run/keepalived/keepalived.conf'.
Dec  8 09:38:18 LR1 keepalived-fifo.py: Starting FIFO pipe for Keepalived
Dec  8 09:38:18 LR1 keepalived-fifo.py: Unable to load configuration:
Dec  8 09:38:18 LR1 keepalived-fifo.py: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec  8 09:38:18 LR1 keepalived-fifo.py: INSTANCE foo changed state to FAULT

Indicates that the code patch had an issue: https://github.com/vyos/vyos-1x/blob/7ac6b589e3827ea6ff8e796bb6617c021cf26bca/src/system/keepalived-fifo.py#L66-L76

My initial assumption is that conf = ConfigTreeQuery() does not work on bootup. @jestabro any idea?

c-po changed the task status from Open to Needs testing.Dec 8 2021, 9:05 AM
c-po claimed this task.
c-po triaged this task as Normal priority.
c-po changed Difficulty level from Unknown (require assessment) to Easy (less than an hour).
c-po moved this task from Need Triage to Finished on the VyOS 1.4 Sagitta board.
c-po moved this task from Need Triage to Finished on the VyOS 1.3 Equuleus (1.3.0) board.

Logic has been changed in keepalived-fifo.py to read in the CLI data (and thus the scripts) the first time they are really needed by a transition.

I've checked this bug on "VyOS 1.3-beta-202112080938", everything is well.
After rebooting scripts(sync-groups) work as needed.

"VyOS 1.4-rolling-202112081536" : sync-group scripts work well after rebooting.

Hey - Thanks for the fix works great. One more thing I noticed, I'm not sure if it's an edge case or considered a "bug".

After restart the script does execute now, except right after reboot if the router acquires the "MASTER" role, it does not execute the script. However, after running "restart vrrp" once everything functions normally.

The workaround would be just to execute a service restart post boot always, to make sure the script gets run if fails.

As I understand it, after the reboot, the following scenarios are running:

  1. VRRP SCRIPT - "master" - works only after "vrrp restart"
  1. VRRP SCRIPT - "fault" - works well
  1. VRRP SCRIPT - "backup" - works well

I need a litle time to check it.

Looks like the issue is persistent.
Tested on VyOS 1.3-beta-202112080938
Reboot VyOS (first one, wait for it to load. Then the other one)
'transition-script master' script doesn't start

@m.korobeinikov - I did not test the backup and fault, but what you said sums it up correctly. Let me know if you need anything on exact re-creation, etc..

I've checked it and what we have:

  1. Reboot--->VRRP(master)script isn't well--->VRRP(backup)script isn't well--->VRRP(master)script is well

REBOOT
VRRP(master)

Dec 10 04:14:42 keepalived-fifo.py[1764]: Starting FIFO pipe for Keepalived
Dec 10 04:14:42 keepalived-fifo.py[1764]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 04:14:42 keepalived-fifo.py[1764]: Message reading start
Dec 10 04:14:42 keepalived-fifo.py[1764]: Message processing start
Dec 10 04:14:42 keepalived-fifo.py[1764]: Unable to load configuration:
Dec 10 04:14:42 keepalived-fifo.py[1764]: Received message: GROUP "SYN" MASTER 0
Dec 10 04:14:42 keepalived-fifo.py[1764]: GROUP SYN changed state to MASTER
Dec 10 04:14:42 keepalived-fifo.py[1764]: Unable to load configuration:
VRRP(backup)
Dec 10 04:16:24 keepalived-fifo.py[1764]: Received message: GROUP "SYN" BACKUP 0
Dec 10 04:16:24 keepalived-fifo.py[1764]: GROUP SYN changed state to BACKUP
Dec 10 04:16:25 keepalived-fifo.py[1764]: Loaded configuration: **********************
VRRP(master)-OK
Dec 10 04:18:16 keepalived-fifo.py[1764]: Received message: GROUP "SYN" MASTER 0
Dec 10 04:18:16 keepalived-fifo.py[1764]: GROUP SYN changed state to MASTER
Dec 10 04:18:16 keepalived-fifo.py[1764]: Running the command: /config/scripts/ipsec-restart.sh
  1. Reboot--->VRRP(master)script isn't well--->VRRP(backup)script isn't well--->VRRP(fault)script is well

REBOOT
VRRP(master)

Dec 10 04:28:37 keepalived-fifo.py[1734]: Starting FIFO pipe for Keepalived
Dec 10 04:28:37 keepalived-fifo.py[1734]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 04:28:37 keepalived-fifo.py[1734]: Message reading start
Dec 10 04:28:37 keepalived-fifo.py[1734]: Message processing start
Dec 10 04:28:37 keepalived-fifo.py[1734]: Received message: GROUP "SYN" MASTER 0
Dec 10 04:28:37 keepalived-fifo.py[1734]: GROUP SYN changed state to MASTER
Dec 10 04:28:37 keepalived-fifo.py[1734]: Unable to load configuration:
VRRP(backup)
Dec 10 04:29:51 keepalived-fifo.py[1734]: Received message: GROUP "SYN" BACKUP 0
Dec 10 04:29:51 keepalived-fifo.py[1734]: GROUP SYN changed state to BACKUP
Dec 10 04:29:52 keepalived-fifo.py[1734]: Loaded configuration: **********************
VRRP(fault)-OK
Dec 10 04:30:10 keepalived-fifo.py[1734]: Received message: GROUP "SYN" FAULT 0
Dec 10 04:30:10 keepalived-fifo.py[1734]: GROUP SYN changed state to FAULT
Dec 10 04:30:10 keepalived-fifo.py[1734]: Running the command: /config/scripts/ipsec-stop.sh
  1. Reboot--->VRRP(backup)script isn't well--->VRRP(fault)script isn't well--->VRRP(backup)script is well

REBOOT

VRRP(backup)
Dec 10 04:33:23 keepalived-fifo.py[1747]: Starting FIFO pipe for Keepalived
Dec 10 04:33:23 keepalived-fifo.py[1747]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 04:33:23 keepalived-fifo.py[1747]: Message reading start
Dec 10 04:33:23 keepalived-fifo.py[1747]: Message processing start
Dec 10 04:33:23 keepalived-fifo.py[1747]: Unable to load configuration:
Dec 10 04:33:23 keepalived-fifo.py[1747]: Received message: GROUP "SYN" BACKUP 0
Dec 10 04:33:23 keepalived-fifo.py[1747]: GROUP SYN changed state to BACKUP
Dec 10 04:33:23 keepalived-fifo.py[1747]: Unable to load configuration:
VRRP(fault)
Dec 10 04:34:35 keepalived-fifo.py[1747]: Received message: GROUP "SYN" FAULT 0
Dec 10 04:34:35 keepalived-fifo.py[1747]: GROUP SYN changed state to FAULT
Dec 10 04:34:35 keepalived-fifo.py[1747]: Loaded configuration: **********************
VRRP(backup)-OK
Dec 10 04:36:33 keepalived-fifo.py[1747]: Received message: GROUP "SYN" BACKUP 0
Dec 10 04:36:33 keepalived-fifo.py[1747]: GROUP SYN changed state to BACKUP
Dec 10 04:36:33 keepalived-fifo.py[1747]: Running the command: /config/scripts/ipsec-stop.sh
  1. Reboot--->VRRP(backup)script isn't well--->VRRP(master)script isn't well--->VRRP(backup)script is well

REBOOT

VRRP(backup)
Dec 10 04:41:15 keepalived-fifo.py[1746]: Starting FIFO pipe for Keepalived
Dec 10 04:41:15 keepalived-fifo.py[1746]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 04:41:15 keepalived-fifo.py[1767]: Message reading start
Dec 10 04:41:15 keepalived-fifo.py[1767]: Message processing start
Dec 10 04:41:15 keepalived-fifo.py[1746]: Unable to load configuration:
Dec 10 04:41:15 keepalived-fifo.py[1746]: Received message: GROUP "SYN" BACKUP 0
Dec 10 04:41:15 keepalived-fifo.py[1746]: GROUP SYN changed state to BACKUP
Dec 10 04:41:15 keepalived-fifo.py[1746]: Unable to load configuration:
VRRP(master)
Dec 10 04:41:51 keepalived-fifo.py[1746]: Received message: GROUP "SYN" MASTER 0
Dec 10 04:41:51 keepalived-fifo.py[1746]: GROUP SYN changed state to MASTER
Dec 10 04:41:51 keepalived-fifo.py[1746]: Loaded configuration: **********************
VRRP(backup)-OK
Dec 10 04:47:49 keepalived-fifo.py[1746]: Received message: GROUP "SYN" BACKUP 0
Dec 10 04:47:49 keepalived-fifo.py[1746]: GROUP SYN changed state to BACKUP
Dec 10 04:47:49 keepalived-fifo.py[1746]: Running the command: /config/scripts/ipsec-stop.sh
  1. Reboot--->VRRP(backup)script isn't well--->VRRP(master)script isn't well--->VRRP(backup)script is well

REBOOT

VRRP(backup)
Dec 10 06:02:00 keepalived-fifo.py[1812]: Starting FIFO pipe for Keepalived
Dec 10 06:02:00 keepalived-fifo.py[1812]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 06:02:00 keepalived-fifo.py[1812]: Message reading start
Dec 10 06:02:00 keepalived-fifo.py[1812]: Message processing start
Dec 10 06:02:00 keepalived-fifo.py[1812]: Unable to load configuration:
Dec 10 06:02:00 keepalived-fifo.py[1812]: Received message: GROUP "SYN" BACKUP 0
Dec 10 06:02:00 keepalived-fifo.py[1812]: GROUP SYN changed state to BACKUP
Dec 10 06:02:00 keepalived-fifo.py[1812]: Unable to load configuration:
VRRP(master)
Dec 10 06:02:48 keepalived-fifo.py[1812]: Received message: GROUP "SYN" MASTER 0
Dec 10 06:02:48 keepalived-fifo.py[1812]: GROUP SYN changed state to MASTER
Dec 10 06:02:49 keepalived-fifo.py[1812]: Loaded configuration: **********************
VRRP(fault)-OK
Dec 10 06:03:01 keepalived-fifo.py[1812]: Received message: GROUP "SYN" FAULT 0
Dec 10 06:03:01 keepalived-fifo.py[1812]: GROUP SYN changed state to FAULT
Dec 10 06:03:01 keepalived-fifo.py[1812]: Running the command: /config/scripts/ipsec-stop.sh

Reboot--->VRRP(fault)script isn't well--->VRRP(master)script is well
(in this case between state "fault" and "master" vrrp was "backup" short time)
REBOOT

VRRP(fault)
Dec 10 06:09:22 keepalived-fifo.py[1739]: Starting FIFO pipe for Keepalived
Dec 10 06:09:22 keepalived-fifo.py[1739]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 06:09:22 keepalived-fifo.py[1739]: Message reading start
Dec 10 06:09:22 keepalived-fifo.py[1739]: Message processing start
Dec 10 06:09:22 keepalived-fifo.py[1739]: Unable to load configuration:
Dec 10 06:09:22 keepalived-fifo.py[1739]: Received message: GROUP "SYN" FAULT 0
Dec 10 06:09:22 keepalived-fifo.py[1739]: GROUP SYN changed state to FAULT
Dec 10 06:09:23 keepalived-fifo.py[1739]: Unable to load configuration:
VRRP(backup)
Dec 10 06:09:54 keepalived-fifo.py[1739]: Received message: GROUP "SYN" BACKUP 0
Dec 10 06:09:54 keepalived-fifo.py[1739]: GROUP SYN changed state to BACKUP
Dec 10 06:09:55 keepalived-fifo.py[1739]: Loaded configuration: **********************
VRRP(master)-OK
Dec 10 06:09:58 keepalived-fifo.py[1739]: Received message: GROUP "SYN" MASTER 0
Dec 10 06:09:58 keepalived-fifo.py[1739]: GROUP SYN changed state to MASTER
Dec 10 06:09:58 keepalived-fifo.py[1739]: Running the command: /config/scripts/ipsec-restart.sh

Reboot--->VRRP(fault)script isn't well--->VRRP(backup)script isn't well--->VRRP (any state) script is well
REBOOT

VRRP(fault)
Dec 10 06:18:03 keepalived-fifo.py[1741]: Starting FIFO pipe for Keepalived
Dec 10 06:18:03 keepalived-fifo.py[1741]: PIPE already exist: /run/keepalived/keepalived_notify_fifo
Dec 10 06:18:03 keepalived-fifo.py[1741]: Message reading start
Dec 10 06:18:03 keepalived-fifo.py[1741]: Message processing start
Dec 10 06:18:03 keepalived-fifo.py[1741]: Unable to load configuration:
Dec 10 06:18:03 keepalived-fifo.py[1741]: Received message: GROUP "SYN" FAULT 0
Dec 10 06:18:03 keepalived-fifo.py[1741]: GROUP SYN changed state to FAULT
Dec 10 06:18:03 keepalived-fifo.py[1741]: Unable to load configuration:
VRRP(backup)
Dec 10 06:18:35 keepalived-fifo.py[1741]: Received message: GROUP "SYN" BACKUP 0
Dec 10 06:18:35 keepalived-fifo.py[1741]: GROUP SYN changed state to BACKUP
Dec 10 06:18:36 keepalived-fifo.py[1741]: Loaded configuration: **********************
VRRP(fault)-OK
Dec 10 06:19:01 keepalived-fifo.py[1741]: Received message: GROUP "SYN" FAULT 0
Dec 10 06:19:01 keepalived-fifo.py[1741]: GROUP SYN changed state to FAULT
Dec 10 06:19:01 keepalived-fifo.py[1741]: Running the command: /config/scripts/ipsec-stop.sh

As I can see, the scripts start working well only when VRRP changes status the second time.
Before this event scripts, don't execute.

After rebooting, sync-groups scripts start working well when second changing the status VRRP.
Something wrong with "keepalived-fifo.py"

m.korobeinikov changed the task status from Needs testing to In progress.Dec 10 2021, 6:23 AM

@m.korobeinikov please close if the current solution works for you!

@c-po Everything works well, thanks. I've checked on this version (VyOS 1.3-beta-202112120443).
I m going to check it on 1.4