Page MenuHomeVyOS Platform

Upstream Kernel Patches from Semper Victus Linux Hardened Tree
Open, Requires assessmentPublicFEATURE REQUEST

Description

Our company maintains several trees of hardened Linux kernels with extended functionality for scheduling, networking, and storage (we pretty much have our own Arch distro). While our primary efforts focus on development atop Grsecurity kernels, we maintain a parallel tree using the GrapheneOS kernel patches (linux-hardened) for distribution to clients who do not acquire Grsecurity patches from Open Source Security Inc. During our internal work on VyOS we've incorporated all of the out-of-tree kmods and patches directly into the kernel source - ixgbe, qat, stackable FS changes, etc. We have extracted these efforts along with applicable other patches to use in VyOS which provide:

  1. xtables-addons in-tree: iptables modules for geoip, tarpit, etc
  2. PF_RING in-tree: this can underpin NIDS traffic collection and other direct-access network functions
  3. LVS patches: http://ja.ssi.bg/
  4. IPoE in-tree: VyOS patch adapted for 5.4 tree

5: VLANmon in-tree: VyOS patch adapted for 5.4 tree
6: WireGuard official in-tree backport
7: AUFS5 to permit complex overlays without the efficiency issues of overlayfs
8: EoIP protocol driver in-tree: support MikroTik L2 tunneling protocol
9: UKSM in-tree for userspace memory dedup: this will be very handy once Dockers are being run in VyOS (we use systemd-nspawn today to do things like run Duo's authproxy, works great)
10: Inotify VyOS patch adapted for 5.4
11: RT6 link filter VyOS patch adapted for 5.4
12: Linux-Hardened: ASLR patches, some community backports from grsec, reduces privileged access vectors, hardens and fixes type issue on certain structures and data types, etc. Basically makes the OS-tier harder to attack, its no grsec, but its IS better than upstream. Can also theoreticaly work with LTO+CFI with Clang, though we use GCC in-house to take advantage of RANDSTRUCT and such (custom kernels which aren't distributed gain decent standoff from this, distro kernels benefit too, though less).
13: If desired, we can also push up our linux-hardened + RSBAC patch to provide full-fledged RBAC implementation on vanilla/hardened kernels.

Once this is done, we will upstream implementation of Hardened Malloc from https://github.com/GrapheneOS/hardened_malloc to help tighten up userspace memory defenses (we run it on our images, for a while now, no issues there).
None of this will make your system impervious, but memory corruption attacks will be harder (a lot harder remotely), same for privesc, and actual network defensive capabilities using XTables (geoip firewall rules are handy).
We tend to track LTS' so can support this work ongoing with minimal overhead if VyOS sticks to those kernels, or if we have to support yet another tree, we can work with the VyOS team to transfer knowledge and share maintenance costs for the effort.

Details

Difficulty level
Hard (possibly days)
Version
VyOS 1.3, Linux 5.4
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Perfectly compatible
Issue type
Feature (new functionality)

Related Objects

StatusSubtypeAssignedTask
OpenFEATURE REQUESTsempervictus
ResolvedFEATURE REQUESTc-po

Event Timeline

It’s best to provide links to related descriptions instead of asking everyone to search for the related details and patch implementations you describe

While i appreciate that you have an opinion of what's "best," i'm not re-summarizing 10+y of Linux out-of-tree history to spoon feed someone data they can, and should (like good engineers do), acquire on their own. Several of those patches are simply in-tree integrations for things currently built and packaged as kmods by VyOS on an LTS tree, the rest are well documented long running projects of their own which one must research and review the source code for anyway to properly understand their function and benefit.

sempervictus updated the task description. (Show Details)
sempervictus changed Difficulty level from Unknown (require assessment) to Hard (possibly days).
sempervictus changed Version from - to VyOS 1.3, Linux 5.4.

Created a GitHub PR against 5.4.78 with the core functions listed above, ixbe and QAT in-tree as well as wireguard (avoids the convoluted module builds and permits LTO/CFI passes)

This comment was removed by debiansid.

it stop at

AR      crypto/built-in.a
  LD [M]  crypto/crypto_simd.o
make[2]: *** [debian/rules:6: build] Error 2
dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2
make[1]: *** [scripts/Makefile.package:83: bindeb-pkg] Error 2
make: *** [Makefile:1464: bindeb-pkg] Error 2
[email protected]:/vyos/vyos-build-5.4.78/packages/linux-kernel$

Thank you sir. Worked through a clean build, updated patches, rebased, and pushed.

Added an inert patch (disabled in Kconfig) for https://www.rsbac.org/ on 5.4. This can be used to significantly harden the restrictions intended by the CLI to limit users to specifically defined roles, same goes for applications/containers.
If adding container support to VyOS is still on the roadmap, we're going to want to take extra care to enforce the boundaries between them and the host since real world use cases are pretty much guaranteed to leave old vulnerable containers running on long-running network appliances making for a variable and worsening attack surface over time.
This isn't quite as integrated and doesnt provide nearly the coverage as what you get with grsec+pax, but a rough approximation of "role-based FS restrictions and runtime hardening" is now in the pull request along with the other stuff which seemed pertinent for upstream.

Important note on this PR - in order to build the GCC plugins which perform most of the self-protection work, the Docker container needs gcc-8-plugin-dev installed. Otherwise it builds, but silently downgrades the configs dropping RANDSTRUCT/STACKLEAK silently.
Pulled RSBAC out for now (issues with building the rest while its in there but disabled), validated builds with and without the plugins package for GCC8.

how to build any version of linux kernel using build-kernel.sh and make iso?

So how are userspace packages for this sort of stuff handled? I assume we need to itemize out individual phabricator tickets?
Off the top of my head, relevant things to add to uspace would be:

  1. eoip binary
  2. eoip CLI wrapper
  3. Xtables userspace with GeoIP table data and updater script (we would need to figure out how to deal with rule placement for persistence)
  4. Xtables-related CLI for firewall matching on GeoIP, DNS, etc
  5. Xtables-related CLI for firewall actions to TARPIT or DELUDE
  6. UKSM userspace (or just wrappers for the sysfs interface in CLI)
  7. Hardened Malloc with system-wide LD_PRELOAD or maintain a vyos-specific libc package with it built-in

I've added the two binary defense components oustanding:

  1. Active kernel integrity checking and some level of exploit detection/prevention via the Linux Kernel Runtime Guard from OpenWall
  2. Userspace memory defenses by way of Hardened Malloc from GrapheneOS using the Whonix debian packaging repo

With basic coverage for kernel and userspace memory, the last major piece toward a coherent defense strategy is mandatory access control (MAC) via LSM or something like RSBAC.
This is by far the hardest part of "security implementation" as it requires granular understanding and definition of permissible functions (syscalls & VFS path access) in order to write the policies involved.
Setups like Grsecurity's RBAC and RSBAC have (to varying degrees) a "learning mode" which profiles the runtime of a system and catalogs the observed accesses by UID/GID/role to define an initial policy. They have some magic to perform reductions in the policy for optimization, but even these things require that human engineers qualify/validate/update the resulting policy definitions.
Android handles this through SELinux, which they also use to enforce W^X IIRC (i was planning to pull SARA or RSBAC in to handle that since we have it working well with PaX in our grsec builds), but unless you have a basement full of NSA/Google staffers to handle that, i'm kind of loathe to go down that dark road.

At the current state of patching, this is one of the "harder setups" you'll find in the public domain - binary exploitation of this mess will almost always require an attacker to specifically target the build, do extra work to infoleak addresses in order to find viable targets in memory, and deal with both the deterministic (slab quarantines in hardened malloc, lkrg operations, etc) and probabalistic (randomized structure layouts, improved ASLR, etc) measures applied. Exploitation of userspace services is drastically hampered by Hardened Malloc - kind of hard to mark an overflown page executable when its metadata isnt in it.
With W^X, syscall, and VFS restrictions applied down the line, this will become truly unpleasant, even for "initiated attackers" regardless of whether their posture is remote or they "come in" by way of a poisoned container image.

I've been refreshing the stack against current branch to keep testers building, and have added the FSGSBASE backport to 5.4 as a technical argument for keeping to a properly mature LTS even when users have a good case for needing newer functionality.
What is the plan of action for this effort, and is there a written policy on which kernels are selected and how they're selected for the OS? I can keep doing the rebase & push dance once a week or so, but is anyone on the VyOS team actually testing this stuff and has anyone upstream discussed the functional security benefits to users of GeoIP firewall filters or TARPIT/DELUDE/etc response actions separately from the system hardening functions inhere?

EDIT: some notes from Phoronix on the FSGSBASE tests they ran -

If taking the geometric mean of all the tests carried out, the overall kernel performance on this Intel Xeon Cascade Lake Refresh server was up by 4%. But if dropping all of the results without a statistically significant difference to come to the 31 benchmarks with a measurable difference, the FSGSBASE patches for those impacted workloads yields a 13% improvement on average. Those wanting to dig through all the numbers can do so via this OpenBenchmarking.org result file.

Since 5.10 appears to be holding solid, and grsecurity is using 5.10 for their beta branch, i've completed the forward port of these core functions to the same kernel revision being used in the current branch (at the time of commit).
Whats the intent with Intel drivers there? If we want to pull in from Intel, i think we ought to do the same in-tree patch process to build and sign the modules at build-time (and enforce module signing validation to load at runtime).

To round out the effort, i've added an optional patch to the series which provides granular AAA/RBAC from ring0 and can also deliver the W^X functionality for userspace along with those functions.

erkin set Issue type to Feature (new functionality).Aug 29 2021, 1:06 PM
erkin removed a subscriber: Active contributors.