Page MenuHomeVyOS Platform

R8169 driver crash
Open, NormalPublicBUG

Description

on both 1.3.3 and rolling the R8169 driver crashes with the reltek nic on my motherboard (a gigabyte A520I AC)
with this message
rtl_rxtx_empty_cond == 0 (loop: 42, delay: 100)
It then resets itself and the network card resumes until it crashes the next time :)

I installed the following module r8168-dkms and used this git tag for it: https://salsa.debian.org/debian/r8168/-/tree/debian/8.047.05-1 - I used this one because it was the last one that had the 5.4 patch
No more errors in the kernel log and no more broken connections either for the services behind the vyos install.

Details

Difficulty level
Easy (less than an hour)
Version
1.3.3 and 1.4.0
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)
Issue type
Bug (incorrect behavior)

Event Timeline

If it crashes it should be reported upstream to kernel.org (and the maintainer for the r8169 driver) since VyOS is using the latest Linux Kernel LTS (current version 6.1.43 as of writing):

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/realtek?h=v6.1.43

Also realtek being realtek so another advice is to get a better nic, just read these comments from the sourcecode of the realtek driver for FreeBSD: http://fxr.watson.org/fxr/source/pci/if_rl.c?v=FREEBSD-9-0

Having that said looking at the source code it looks like this shows up inside the rtl_loop_wait function who is called by rtl_loop_wait_high who is called by rtl_wait_txrx_fifo_empty:

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/realtek/r8169_main.c?h=v6.1.43#n709

	if (net_ratelimit())
		netdev_err(tp->dev, "%s == %d (loop: %d, delay: %lu).\n",
			   c->msg, !high, n, usecs);
	return false;

Googling on this error doesnt give more hints than some success to changing to the r8168 driver (as you already concluded).

I found that this is the source for the r8168 driver however it doesnt seem to been updated since Linux 5.19 (which is end-of-life since summer 2022 and its also newer than the one in the Debian repo):

https://www.realtek.com/en/component/zoo/category/network-interface-controllers-10-100-1000m-gigabit-ethernet-pci-express-software

GBE Ethernet LINUX driver r8168 for kernel up to 5.19
Version: 8.051.02
Update Time: 2022/11/15

In the meantime using latest 1.4-rolling, I guess altering rx/tx-ring buffers along with enable/disable various offloading options doesnt affect the rate of the driver crashes?

Example:

set interfaces ethernet eth0 offload gro
set interfaces ethernet eth0 offload gso
set interfaces ethernet eth0 offload lro
set interfaces ethernet eth0 offload rfs
set interfaces ethernet eth0 offload rps
set interfaces ethernet eth0 offload sg
set interfaces ethernet eth0 offload tso
set interfaces ethernet eth0 ring-buffer rx '4096'
set interfaces ethernet eth0 ring-buffer tx '4096'

How often do the crashes occur and is traffic affected or you only notice this through the logs?

The way I use it is a bit weird. I have ESXi installed on the host and since it has no driver for it, i pass it through to vyos and then bridge it with a vmxnet interface so that hosts in the same virtual switch can use that interface instead of the usb one I use for ESXi remote access.

I would get disconects while transfering large files to the instance that was behind that vswitch.

This driver is present in the debian backports, the latest version of it support 6.1 and even 6.5
https://salsa.debian.org/debian/r8168/-/tree/main/debian/patches

LE. About reporting it upstream - I am pretty sure folks are aware since I found numerous bug reports on the internet about this.

Dont count on it - the way things works on internet is that there are alot of people complaining at stuff but very few who does something about it :-)

So I would still encourage you to file this as a bugreport upstream preferly to the maintainer of the r8169 driver.

Otherwise the maintainer will be like "I havent heard of any issues with the driver".

Since you have VMware in between thats one thing to blame at - do you have a spare device with the same hardware setup where you can install VyOS 1.4-rolling natively (aka bare metal) without VMware in between and see if you get the same result?

I admit my setup isn't really the most common for production, but for the home lab crowd it might be ok. Realtek makes many network chip models I assume and their implementation might also vary from board manufacturer to another.
About not using VMware - that's a tougher ask :)

These 2 drivers have existed in tandem for over a decade now, I don't think that realtek would change their behavior about it.

My main reason for the bug report here wasn't to get the fix in the upstream driver, but rather have the realtek dkms driver integrated with Vyos - and I can volunteer the PR if you think it would be mergeable.

My rationale for this is as follows: The realtek boards aren't common outside the home lab crowd and having this driver as an option could be nice i believe.

Here's the modinfo on the driver I ended up using with 1.3:

filename:       /lib/modules/5.4.243-amd64-vyos/updates/dkms/r8168.ko
version:        8.047.05-NAPI
license:        GPL
description:    RealTek RTL-8168 Gigabit Ethernet driver
author:         Realtek and the Linux r8168 crew <[email protected]>
srcversion:     3156050721B171C98446600
alias:          pci:v00001186d00004300sv00001186sd00004B10bc*sc*i*
alias:          pci:v000010ECd00002600sv*sd*bc*sc*i*
alias:          pci:v000010ECd00002502sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008161sv*sd*bc*sc*i*
alias:          pci:v000010ECd00008168sv*sd*bc*sc*i*
depends:
retpoline:      Y
name:           r8168
vermagic:       5.4.243-amd64-vyos SMP mod_unload modversions
parm:           speed_mode:force phy operation. Deprecated by ethtool (8). (uint)
parm:           duplex_mode:force phy operation. Deprecated by ethtool (8). (uint)
parm:           autoneg_mode:force phy operation. Deprecated by ethtool (8). (uint)
parm:           advertising_mode:force phy operation. Deprecated by ethtool (8). (uint)
parm:           aspm:Enable ASPM. (int)
parm:           s5wol:Enable Shutdown Wake On Lan. (int)
parm:           s5_keep_curr_mac:Enable Shutdown Keep Current MAC Address. (int)
parm:           rx_copybreak:Copy breakpoint for copy-only-tiny-frames (int)
parm:           use_dac:Enable PCI DAC. Unsafe on 32 bit PCI slot. (int)
parm:           timer_count:Timer Interrupt Interval. (int)
parm:           eee_enable:Enable Energy Efficient Ethernet. (int)
parm:           hwoptimize:Enable HW optimization function. (ulong)
parm:           s0_magic_packet:Enable S0 Magic Packet. (int)
parm:           debug:Debug verbosity level (0=none, ..., 16=all) (int)

It seems to exist for current Debian 12.1 (bookworm) so I think it should be a relativily simple task to add that if not already existing:

https://packages.debian.org/bookworm/r8168-dkms

However note that enabling this driver will disable the kernel r8169 driver which might need some big WARNING about enabling this through VyOS config:

This driver should only be used for devices not yet supported by the in-kernel driver r8169. Please see the README.Debian for instructions how to report bugs against r8169 that made it necessary to use r8168-dkms.

Installation of the r8168-dkms package will disable the in-kernel r8169 module. To re-enable r8169, the r8168-dkms package must be purged.

Also Debian encourage people facing issues with r8169 to report it.

Note that there is https://vyos.dev/T5284 since june 2023 but that has been reverted due to conflicts.

Viacheslav triaged this task as Normal priority.Jan 20 2024, 1:00 PM