Page MenuHomeVyOS Platform

Conntrack-sync Internal Cache Growing Uncontrollably
Needs reporter action, HighPublicBUG

Description

For the past few months, conntrack-sync on my routers has been growing to an absurd size (even though the conntrack table itself is relatively small), at which point I begin to see intermittent connectivity issues until I restart the service via restart conntrack-sync. This lasts a few days until I have to do the same thing again. Here's what the a side router looks like when comparing conntrack-sync's internal cache vs the actual conntrack table; you'll notice there's a huge discrepancy (284 active connections in the conntrack table, but nearly 22,000 in the cache), and this is 20 minutes after restarting the service:

trae@cr01a-vyos:~$ show conntrack table ipv4 | wc -l
286
trae@cr01a-vyos:~$ show conntrack statistics 
                CPU         Found         Invalid          Insert    Insert fail    Drop       Early drop        Errors       Search restart
-----  -------  ----------  ------------  ---------------  --------  -------------  ---------  ----------------  -----------  ----------------
cpu=0  found=0  invalid=74  insert=0      insert_failed=0  drop=0    early_drop=0   error=9    search_restart=0  (null)=13    (null)=0
cpu=1  found=0  invalid=64  insert=7121   insert_failed=0  drop=0    early_drop=0   error=2    search_restart=0  (null)=235   (null)=0
cpu=2  found=0  invalid=83  insert=73041  insert_failed=0  drop=0    early_drop=0   error=495  search_restart=6  (null)=93    (null)=0
cpu=3  found=0  invalid=73  insert=184    insert_failed=0  drop=0    early_drop=0   error=0    search_restart=0  (null)=75    (null)=0
cpu=4  found=0  invalid=74  insert=0      insert_failed=0  drop=0    early_drop=0   error=885  search_restart=0  (null)=1     (null)=0
cpu=5  found=0  invalid=64  insert=1255   insert_failed=0  drop=0    early_drop=0   error=0    search_restart=0  (null)=0     (null)=0
cpu=6  found=2  invalid=68  insert=0      insert_failed=1  drop=1    early_drop=0   error=0    search_restart=0  (null)=1051  (null)=0
cpu=7  found=0  invalid=71  insert=3046   insert_failed=0  drop=0    early_drop=0   error=32   search_restart=0  (null)=88    (null)=0
trae@cr01a-vyos:~$ show conntrack-sync statist
cache internal:
current active connections:            22199
connections created:                   73875    failed:            0
connections updated:                   38217    failed:            0
connections destroyed:                 51676    failed:            0

external inject:
connections created:                   72473    failed:            0
connections updated:                   28742    failed:            0
connections destroyed:                   270    failed:            0

traffic processed:
          1031447727 Bytes                    411616 Pckts

multicast traffic (active device=bond0.110):
             9065520 Bytes sent              8208184 Bytes recv
              120422 Pckts sent               111498 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

Main Table Statistics:

Here's the relevant portions of the configuration:

trae@cr01a-vyos:~$ show conf com | grep -P 'set (system conntrack|service conntrack-sync)'
set service conntrack-sync accept-protocol 'icmp'
set service conntrack-sync accept-protocol 'icmp6'
set service conntrack-sync accept-protocol 'tcp'
set service conntrack-sync accept-protocol 'udp'
set service conntrack-sync disable-external-cache
set service conntrack-sync event-listen-queue-size '100'
set service conntrack-sync failover-mechanism vrrp sync-group 'CR01.INT'
set service conntrack-sync ignore-address 'fe8::/10'
set service conntrack-sync ignore-address 'ff00::/8'
set service conntrack-sync ignore-address '169.254.0.0/16'
set service conntrack-sync ignore-address '224.0.0.0/4'
set service conntrack-sync ignore-address '127.0.0.0/8'
set service conntrack-sync interface bond0.110
set service conntrack-sync sync-queue-size '100'
set system conntrack flow-accounting
set system conntrack modules
set system conntrack table-size '1000000'
set system conntrack timeout icmp '10'
set system conntrack timeout other '60'
set system conntrack timeout tcp close-wait '20'
set system conntrack timeout tcp established '1800'
set system conntrack timeout tcp fin-wait '30'
set system conntrack timeout tcp syn-recv '30'
set system conntrack timeout tcp syn-sent '60'
set system conntrack timeout udp stream '60'

The b side shows similar symptoms and statistics. Please let me know if you need anything else, I can get you access to the routers as well.

Details

Difficulty level
Unknown (require assessment)
Version
1.5-rolling-202403120022
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Bug (incorrect behavior)

Event Timeline

trae32566 created this task.
trae32566 created this object in space S1 VyOS Public.

Here's the generated configuration from /run/conntrackd/conntrackd.conf:

# Synchronizer settings
Sync {
    Mode FTFW {
        DisableExternalCache on
    }
    Multicast {
        IPv4_address 225.0.0.50
        Group 3780
        IPv4_interface 192.168.15.3
        Interface bond0.110
        SndSocketBuffer 104857600
        RcvSocketBuffer 104857600
        Checksum on
    }
}
Helper {
    Type rpc inet tcp {
        QueueNum 3
        Policy rpc {
            ExpectMax 1
            ExpectTimeout 300
        }
    }
    Type rpc inet udp {
        QueueNum 4
        Policy rpc {
            ExpectMax 1
            ExpectTimeout 300
        }
    }
    Type tns inet tcp {
        QueueNum 5
        Policy tns {
            ExpectMax 1
            ExpectTimeout 300
        }
    }
}

# General settings
General {
    HashSize 262144
    HashLimit 2000000
    LogFile off
    Syslog on
    LockFile /var/lock/conntrack.lock
    UNIX {
        Path /var/run/conntrackd.ctl
    }
    NetlinkBufferSize 2097152
    NetlinkBufferSizeMaxGrowth 104857600
    NetlinkOverrunResync off
    NetlinkEventsReliable on
    Filter From Userspace {
        Address Ignore {
            IPv4_address 169.254.0.0/16
            IPv4_address 224.0.0.0/4
            IPv4_address 127.0.0.0/8
            IPv6_address fe8::/10
            IPv6_address ff00::/8
        }
        Protocol Accept {
            TCP
            UDP
            ICMP
            IPv6-ICMP
        }
    }
}[edit]

@trae32566 Can you provide the next output?

sudo conntrackd -C /run/conntrackd/conntrackd.conf -s  && echo "conntrack_count: " && sudo conntrack -C
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s network
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s cache
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s runtime
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s link
sudo conntrackd -C /run/conntrackd/conntrackd.conf -s queue
Viacheslav changed the task status from Open to Needs reporter action.Tue, Apr 9, 4:06 PM
syncer changed the subtype of this task from "Task" to "Bug".Sat, Apr 20, 5:10 PM