Page MenuHomeVyOS Platform

High RAM usage on SSH logins with lots of IPv6 routes in the routing table.
Closed, ResolvedPublicBUG

Description

So for a while I've been thinking there is a memory leak in VyOS somewhere, but I was unable to exactly pinpoint where.


It turns out the more routes you have in your routing table, the more memory each SSH login uses. Logging in too many times, puts VyOS OOM.

In my case, it resulted in about 200MB per login. I tested on a server with a few copies of full tables, and was able to make it take up to a gig per ssh login.

In the following screenshot, all I did was ssh in 4 times.

On logout of the other three, all the memory is instantly returned


When I shut down my large BGP feeds, ssh logins return to taking about 2MB/login

Since that's not a very good solution, I discovered that it's related to this option:

set service ssh disable-host-validation

With this option NOT present, with ONE SSH session, this is my memory usage:

admin@edge:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7976         965        6238          82         772        6663
Swap:             0           0           0

After setting that option and logging out and back in again, this is my memory usage with TWENTY ssh sessions connected to VyOS:

admin@edge:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7976         794        6408          81         772        6834
Swap:             0           0           0

The same fixed result occurs with something else mentioned in the link attached in a separate comment:

Commenting out the myhostname option. I don't know the implications of that though

/etc/nsswitch.conf

hosts:          files dns #myhostname

Details

Difficulty level
Normal (likely a few hours)
Version
1.3
Why the issue appeared?
Issues in third-party code
Is it a breaking change?
Behavior change

Event Timeline

kroy created this task.Jul 3 2020, 9:54 PM
kroy updated the task description. (Show Details)EditedJul 3 2020, 9:56 PM

To add to this, this link pointed me at the correct solution:

https://forums.centos.org/viewtopic.php?t=62558

It seems like this is poor systemd behavior and the only way around it currently is one of the two hacks mentioned there.

I've also confirmed this only occurs when logging in via IPv6. Adding -4 to the ssh command makes the ssh connections only take up a few megabytes of memory

kroy updated the task description. (Show Details)Jul 3 2020, 10:04 PM
kroy triaged this task as High priority.Jul 3 2020, 10:08 PM
kroy updated the task description. (Show Details)
kroy updated the task description. (Show Details)
kroy updated the task description. (Show Details)Jul 3 2020, 10:10 PM
kroy renamed this task from High RAM usage on SSH logins with lots of routes in the routing table. to High RAM usage on SSH logins with lots of IPv6 routes in the routing table..Jul 4 2020, 2:18 AM
kroy updated the task description. (Show Details)Jul 4 2020, 4:42 AM

Oh, in a system like vyos, SSH memory leak appears to be relatively serious. If there is a solution, it should be handled first

kroy added a comment.EditedJul 4 2020, 5:51 AM

This also has a measurable impact on CPU. You can tell exactly when I applied the nsswitch.conf fix.

c-po added a subscriber: c-po.EditedJul 4 2020, 6:59 AM

I have checked with a v4/v6 full table router and VyOS 1.2.5 - each SSH session will consume 7MiB which semms okay for me.

The same goes for 1.3-rolling-202006150117 so it's a really odd situation but according to the posts I've read this could be a real issue even when it does not happen to me.

@kroy could it be to existing/non existing IPv6 PTR records?

further information: https://manpages.debian.org/testing/libnss-myhostname/nss-myhostname.8.en.html

This one is also reported via forums: https://forum.vyos.io/t/vyos-rolling-full-bgp-ssh-ram-usage/5528

c-po added a comment.Jul 4 2020, 7:07 AM

I found that I had disable-host-validation configured and as soon as I removed it it happened to me, too. Changing task priority.

c-po changed the task status from Open to Confirmed.Jul 4 2020, 7:07 AM
c-po claimed this task.
c-po raised the priority of this task from High to Unbreak Now!.
c-po changed Difficulty level from Unknown (require assessment) to Normal (likely a few hours).
c-po changed Why the issue appeared? from Will be filled on close to Issues in third-party code.
c-po changed Is it a breaking change? from Unspecified (possibly destroys the router) to Behavior change.
c-po moved this task from Need Triage to In Progress on the VyOS 1.3 Equuleus board.
rherold added a subscriber: rherold.Jul 4 2020, 7:09 AM

Hi,

for me it looks like a name lookup error. I have read the forum entry mentioned above. And they fixed it by disabling name lookup.

Can it be that it only happened if a hostname (local or client) can't be probably resolved? This could also be cause not all people have the problem.

pasik added a subscriber: pasik.Jul 4 2020, 8:32 AM
c-po added a comment.EditedJul 4 2020, 7:41 PM

Somehow I do not want to change the overall system behavior by altering nsswitch.conf. I wonder if we should not enable "disable-host-ookups" by default as an IP address is in the end more useful then a resolved PTR. A PTR record can be changed later on when dissecting the logfiles but an IP lookup should stay longer.

kroy added a comment.EditedJul 4 2020, 9:53 PM
default as an IP address is in the end more useful then a resolved PTR

I disagree with this a bunch. Especially if you have a number of logins happening

I really feel like the nsswitch fix is the cleanest from my research. The only thing is might potentially impact is if you did a

ssh vyos@localhost

vs impacting ANY remote login with that default you are suggesting

c-po added a comment.Jul 5 2020, 8:05 AM

Does DNS static-host-mapping still work with the nssswich.conf change? I‘m just curious about the side effects.

kroy added a comment.EditedJul 5 2020, 7:27 PM

It should. This should really be a non-breaking change as it's a fallback for something else that already exists in /etc/hosts.

127.0.1.1       edge.lan.kroy.io edge

Since the order is:

files dns #myhostname

The myhostname is ONLY ever for the local host, and shouldn't actually ever be used since the system's hostname is in /etc/hosts, which is called first.

It apparently causes some weird behavior when the IP of the ssh client doesn't have a PTR record.

For example, if I make my nsswitch.conf look like:

hosts:          files dns myhostname

If my IP has a PTR record either in DNS or hosts, the memory usage is only a few megabytes. My IPv4 has a PTR record, but my IPv6 doesn't, because of privacy extensions. So if I ssh -4, no massive memory usage.

admin@edge:~$ who
admin    pts/0        2020-07-05 19:19 (2001:dead:beed:9:fd1f:a68e:76c:44b6)

So my memory usage before a static host mapping over ipv6:

              total        used        free      shared  buff/cache   available
Mem:           7976         954        6131          86         890        6668
Swap:             0           0           0

After setting my temporary ipv6 address in a static entry with:

set system static-host-mapping host-name test3 inet 2001:dead:beef:9:fd1f:a68e:76c:44b6c

and logged out and back in again:

admin@edge:~$ free -m
              total        used        free      shared  buff/cache   available
Mem:           7976         762        6323          86         890        6860
Swap:             0           0           0

And this is there whether myhostname is commented out or not

admin@edge:~$ who
admin    pts/0        2020-07-05 19:23 (test3)
kroy claimed this task.Jul 5 2020, 8:33 PM
kroy added a comment.Jul 5 2020, 8:48 PM

This PR should correct the issue.

kroy changed the task status from Confirmed to Needs testing.Jul 5 2020, 8:48 PM
c-po closed this task as Resolved.Thu, Jul 30, 5:06 PM
c-po moved this task from In Progress to Finished on the VyOS 1.3 Equuleus board.Tue, Aug 4, 6:05 AM