Page MenuHomePhabricator

flow-accounting stops
Closed, ResolvedPublicBUG

Description

After migrating from 1.1.8, I've found that my flow accounting has stopped working correctly. Basically, flow accounting starts at boot time, but then stops roughly 10 minutes afterwards. My configuration looks like this:

show system flow-accounting
disable-imt
interface eth5
interface eth4
interface eth2
interface eth1
interface eth0
netflow {
engine-id 0
sampling-rate 64
server 192.168.X.Y {
port 9995
}
timeout {
expiry-interval 60
flow-generic 60
icmp 300
max-active-life 60
tcp-fin 60
tcp-generic 60
tcp-rst 60
udp 60
}
version 5
}
sflow {
agent-address 192.168.A.B
sampling-rate 64
server 192.168.G.H {
port 6343
}
}
syslog-facility daemon

Logs show these etries:

Jun 25 19:04:50 vyos pmacctd[11575]: INFO ( default/core ): Linux NetFilter NFLOG Accounting Daemon, uacctd 1.6.2-git (20170401-00)
Jun 25 19:04:50 vyos pmacctd[11575]: INFO ( default/core ): Reading configuration file ‘/etc/pmacct/uacctd.conf’.
Jun 25 19:04:50 vyos pmacctd[11576]: OK ( default_memory/memory ): waiting for data on: ‘/tmp/uacctd.pipe’

Its definitely crashing, but there are no log entires at all.

It must be crashing somehow, because:

show flow-accounting
flow-accounting is not running

The process table shows that the probes still seem to be running.

$ ps ax | grep uacct
11578 ? S 0:00 uacctd: Netflow Probe Plugin [default_nfprobe]
11580 ? S 0:00 uacctd: sFlow Probe Plugin [default_sfprobe]

executing restart flow-accounting then results in

$ ps ax | grep uacct
11578 ? S 0:00 uacctd: Netflow Probe Plugin [default_nfprobe]
11580 ? S 0:00 uacctd: sFlow Probe Plugin [default_sfprobe]
18538 ? S 0:00 uacctd: Netflow Probe Plugin [default_nfprobe]
18539 ? S 0:00 uacctd: sFlow Probe Plugin [default_sfprobe]

I would have throught that the restart would have killed the old processes, but it seems it didn’t. And indeed, a show flow-acconting immediately afterwards tells me its not running; notice that the uacctd core process and imt plugin proccess are not running, although the probes are.

I can then killall uacct, and restart flow-accounting, and then there is data again, at least for the next +10 minutes or so, when the process repeats itself.

I've had this since migrating, for several weeks now. My last image was installed on 20180704.

Details

Difficulty level
Unknown (require assessment)
Version
VyOS 1.2.0-rolling+201807040338
Why the issue appeared?
Will be filled on close

Related Objects

panachoi created this task.Jul 10 2018, 8:03 AM
panachoi updated the task description. (Show Details)
c-po added a subscriber: c-po.Jul 10 2018, 7:59 PM

Hi @panachoi thanks for the info and sharing your config. I was always interested in Netflow (what is it, what can I do with it). I'm happy to look into this but could you tell me any software which I can use to display (graphically would be the best) the flow result?

Sure. The "best" way to visualize flow data is to install nfdump/nfsen:

nfdump can be found at https://github.com/phaag/nfdump
nfsen is at: http://nfsen.sourceforge.net/

Once you install them and direct your flow data to them, you can look at packet flows, see where traffic is coming from/going to, etc.

There are also several plugins, but right now I'd be happy to have flow data not stop after 10 minutes!

syncer triaged this task as Normal priority.Jul 11 2018, 7:39 PM
c-po added a comment.Aug 14 2018, 6:36 PM

Now using the following config, unfortunately my routers don't have that much traffic:

set system flow-accounting disable-imt
set system flow-accounting interface 'eth0'
set system flow-accounting interface 'eth1'
set system flow-accounting netflow engine-id '2'
set system flow-accounting netflow sampling-rate '64'
set system flow-accounting netflow server 172.16.100.1 port '9994'
set system flow-accounting netflow timeout expiry-interval '60'
set system flow-accounting netflow timeout flow-generic '60'
set system flow-accounting netflow timeout icmp '300'
set system flow-accounting netflow timeout max-active-life '60'
set system flow-accounting netflow timeout tcp-fin '60'
set system flow-accounting netflow timeout tcp-generic '60'
set system flow-accounting netflow timeout tcp-rst '60'
set system flow-accounting netflow timeout udp '60'
set system flow-accounting netflow version '5'
set system flow-accounting syslog-facility 'daemon'
c-po added a comment.Aug 20 2018, 8:24 PM

It stops here, too. Why? I have no idea, yet

panachoi added a comment.EditedAug 22 2018, 5:39 AM

Just updated to build from 20180821, and its still stopping; I'm glad that I'm not the only one seeing this, so it probably is some kind of bug. Again, nothing in the log at all, just the startup:

Aug 22 05:09:35 vyos pmacctd[11720]: INFO ( default/core ): Linux NetFilter NFLOG Accounting Daemon, uacctd 1.6.2-git (20170401-00)
Aug 22 05:09:35 vyos pmacctd[11720]: INFO ( default/core ): Reading configuration file '/etc/pmacct/uacctd.conf'.
Aug 22 05:09:35 vyos pmacctd[11721]: OK ( default_memory/memory ): waiting for data on: '/tmp/uacctd.pipe'

It did run for quite a bit longer for me, about 1.5 hours, before crapping out, whereas before it consistently died within 15 minutes.

also restart flow-accounting is still broken, at least for this (corner??) case. It happily restarts uacctd, but does not kill the orphan Probe Plugins:

restart flow-accounting
Starting flow-accounting

11722 ? S 0:00 uacctd: Netflow Probe Plugin [default_nfprobe]
11723 ? S 0:00 uacctd: sFlow Probe Plugin [default_sfprobe]

13804 ? Ss 0:00 uacctd: Core Process [default]
13805 ? S 0:00 uacctd: IMT Plugin [default_memory]
13806 ? S 0:00 uacctd: Netflow Probe Plugin [default_nfprobe]
13807 ? S 0:00 uacctd: sFlow Probe Plugin [default_sfprobe]

Shouldn't a restart do something like 'killall uacctd' before actually (re)starting flow-accounting ?

c-po added a comment.EditedSep 1 2018, 10:54 AM

@panachoi maybe this upgrade helps

Linux NetFilter NFLOG Accounting Daemon, uacctd 1.7.0-git (20170924-00)

It will be in one of the next rolling releases

c-po added a comment.Sep 1 2018, 1:38 PM

You could also alter the file /etc/default/uacctd and add -d into DAEMON_OPTS and restart flow-acounting

c-po added a comment.EditedSep 1 2018, 1:45 PM

It just died with this log:

Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): Start logging ...
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): Linux NetFilter NFLOG Accounting Daemon, uacctd 1.7.0-git (20170924-00)
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ):  '--build=x86_64-linux-gnu' '--includedir=${prefix}/include' '--sysconfdir=/etc' '--localstatedir=/var' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--disable-dependency-tracking' '--prefix=/usr' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--enable-pgsql' '--enable-mysql' '--enable-sqlite3' '--enab
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): Reading configuration file '/etc/pmacct/uacctd.conf'.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): [/etc/pmacct/int_map] (re)loading map.
Sep  1 15:42:05 CR1 pmacctd[3764]: OK ( default_memory/memory ): waiting for data on: '/tmp/uacctd.pipe'
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ): NetFlow probe plugin is originally based on softflowd 0.9.7 software, Copyright 2002 Damien Miller <djm@mindrot.org> All rights reserved.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): [/etc/pmacct/int_map] map successfully (re)loaded.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): [/etc/pmacct/int_map] (re)loading map.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): [/etc/pmacct/int_map] map successfully (re)loaded.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): [/etc/pmacct/int_map] (re)loading map.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): [/etc/pmacct/int_map] map successfully (re)loaded.
Sep  1 15:42:05 CR1 pmacctd[3763]: INFO ( default/core ): Successfully connected Netlink NFLOG socket
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):           TCP timeout: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):  TCP post-RST timeout: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):  TCP post-FIN timeout: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):           UDP timeout: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):          ICMP timeout: 300s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):       General timeout: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):      Maximum lifetime: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ):       Expiry interval: 60s
Sep  1 15:42:05 CR1 pmacctd[3768]: INFO ( default_nfprobe/nfprobe ): Exporting flows to [172.16.100.1]:9992
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (26130/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (7306/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (4410/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: message repeated 2 times: [ INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?]
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (1514/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (7306/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (5858/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (4410/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (8754/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: message repeated 2 times: [ INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?]
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (7306/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (1514/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (4410/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (5858/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (2962/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (1514/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (13098/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (7306/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (5858/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (5858/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (10202/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (4410/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (2962/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (2962/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: message repeated 2 times: [ INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?]
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (1514/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (7306/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (5858/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (5858/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (1514/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (4410/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:32 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (7306/1654942926/frags). Snaplen issue ?

... more entries

Sep  1 15:43:33 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (66/1654942926/frags). Snaplen issue ?
Sep  1 15:43:33 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (2962/1654942926/frags). Snaplen issue ?
Sep  1 15:43:33 CR1 pmacctd[3763]: INFO ( default/core ): short IPv4 packet read (2962/1654942926/frags). Snaplen issue ?
Sep  1 15:43:35 CR1 pmacctd[3764]: ERROR ( default_memory/memory ): Core process *seems* gone. Exiting.
[edit]
cpo@CR1#
c-po added a comment.Sep 1 2018, 1:57 PM
snaplen (-L) [GLOBAL, NO_NFACCTD]
Desc
specifies the maximum number of bytes to capture for each packet. This directive has key importance when enabling both classification and connection tracking engines. In fact, some protocols (mostly text-based eg.: RTSP, SIP, etc.) benefit of extra bytes because they give more chances to successfully track data streams spawned by control channel. But it must be also noted that capturing larger packet portion require more resources. The right value need to be traded-off. In case classification is enabled, values under 200 bytes are often meaningless. 500-750 bytes are enough even for text based protocols. Default snaplen values are ok if classification is disabled. For uacctd daemon, this option doesn't apply to packet snapshot length but rather to the Netlink socket read buffer size. This should be reasonably large - at least 4KB, which is the default value. For large uacctd_nl_size values snaplen could be further increased.

VyOS configures a value of 32 KiB (32 * 1024) https://github.com/vyos/vyatta-netflow/blob/current/scripts/vyatta-netflow.pl#L53 and the default is 4kB.

Switching back to 4kB solved my crashes when flow-accounting 1GBit/s traffic generated by iperf3.

c-po moved this task from Need Triage to In Progress on the VyOS 1.2 Crux board.Sep 1 2018, 2:03 PM
c-po updated the task description. (Show Details)Sep 1 2018, 2:26 PM
c-po moved this task from In Progress to Finished on the VyOS 1.2 Crux board.Sep 1 2018, 7:25 PM

https://github.com/pmacct/pmacct/pull/247

This patch fixes in my case crashes that would result from big NFLOG packets (32K in my case)

pasik added a subscriber: pasik.Nov 4 2018, 11:23 AM

@vtsingaras I see it is merged in upstream, try to merge in vyos asap.

@vtsingaras I merged it in to our repo.
@syncer i see we have no crux branch for pmacct i slacked @dmbaturin

syncer moved this task from Needs Triage to Finished on the VyOS 1.2 Crux (VyOS 1.2.0-GA) board.
syncer closed this task as Resolved.
syncer claimed this task.