Page MenuHomeVyOS Platform

Major Dropping small packets under Xen and AWS
Open, Requires assessmentPublicBUG

Description

Hi I’m new. I’m not sure what needs to go here but I found a problem and someone else on the forum confirmed it is real.

When I run netstat -i it show Tx drops on Ethernet interfaces. Same TX drop shows When I run ifconfig

The drop can be reproduced by sending packets under 214 bytes in size. It seems to drop about 3.75% of small packets. Packets over 215 on the same test has 0% loss reliably.

Tested under XCP Ng 8, XenServer 6.5 and AWS (which runs Xen).

Doesn’t happen on virtual box, VMware, or HyperV.

Suspect it’s related to PARAVIRTUAL IO Drivers (xen_netfront) or something related.

Also only drops packets which are being forwarded from Ethernet to Ethernet. it doesn’t affect traffic that originates or terminated on the VyOS itself. Doesn’t affect traffic from VPN to Ethernet.

I have a XEN Lab available to anyone who wishes to tinker and test.

Details

Difficulty level
Hard (possibly days)
Version
1.3 rolling
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

Sonicbx created this task.Sat, May 23, 9:30 PM

https://phabricator.vyos.net/T935 Here’s the same thing happening in the past. I think it was resolved by doing kernel updates? Can someone do a kernel update in the rolling build?

c-po added a subscriber: c-po.EditedSat, May 23, 9:41 PM

There is no newer kernel then 4.19.124 on the 4.19x train. Newer Kernels do not work as the out-of-tree Intel drivers for the NICs and QAT won‘t compile for Kernel >5.3 and that is bot an LTS one.

In T2505#64889, @c-po wrote:

There is no newer kernel then 4.19.124 on the 4.19x train. Newer Kernels do not work as the out-of-tree Intel drivers for the NICs and QAT won‘t compile for Kernel >5.3 and that is bot an LTS one.

So if this is a kernel issue I should have the same problem with the same kernel under Debian 10 right

jjakob added a subscriber: jjakob.Sun, May 24, 4:23 AM

If this can be solved by a kernel update, there was talk about maybe having different build "flavors" in the past - one with all the hardware nic drivers, one without. The minimal image could then have the latest (5.x) kernel.
There's T2085 which prevents us from testing any newer kernel ourselves as it's built by Jenkinsfiles in the CI, we'd need to manually do the steps the CI does to build a kernel. I proposed a shared script solution for these repositories in that task that could be called from both the CI and vyos-build, this would allow anyone to build all packages, including the kernel, through vyos-build, just for cases like this.

@Sonicbx @jjakob I also created https://phabricator.vyos.net/T2504 - I think we duplicated the issue here. You can close whichever issue you want.

c-po added a comment.Sun, May 24, 10:30 AM
In T2505#64896, @jjakob wrote:

If this can be solved by a kernel update, there was talk about maybe having different build "flavors" in the past - one with all the hardware nic drivers, one without. The minimal image could then have the latest (5.x) kernel.
There's T2085 which prevents us from testing any newer kernel ourselves as it's built by Jenkinsfiles in the CI, we'd need to manually do the steps the CI does to build a kernel. I proposed a shared script solution for these repositories in that task that could be called from both the CI and vyos-build, this would allow anyone to build all packages, including the kernel, through vyos-build, just for cases like this.

vyos-build-kernel comes with dedicated build scripts for some time now - this should no longer be an issue. I do not support the different falvour idea as it will be a nightmare to maintain. Just give it some time when Intel decides to update their stuff.

I replaced the distributed guest utilities (vyos-xe-guest-utilities) with the ones that come with xcp-ng. But this changed nothing regarding the packet loss. Tho, now they get properly recognized by xcp-ng :-)

pasik added a subscriber: pasik.Sun, May 24, 2:47 PM
fetzerms added a comment.EditedTue, May 26, 3:47 PM

Does anyone have some idea on how to test with different kernels? For now this is a deal breaker while using the 1.3.x branch. Tho I would really love to keep using bleeding edge in order to help testing things :-)