Page MenuHomeVyOS Platform

NAT Problem with VRF
In progress, HighPublicBUG

Description

hi,

I start using VRF and stumbled over a nasty nat bug:

Device:

eth0 192.168.0.100/24 gw 192.168.0.1 VRF OOBM
eth1 192.168.0.1/24 VRF default
eth2 no IP VRF default
pppoe0 dynamic public IP from ISP VRF default

eth0 and eth1 are conntected to the same switch and can ping each other

NAT RULE:

set nat source rule 100 outbound-interface 'pppoe0'
set nat source rule 100 protocol 'all'
set nat source rule 100 translation address 'masquerade'

The nat works for all other devices in 192.168.0/24. But all packets from 192.168.0.100 goes without masquerade out of pppoe0.

Details

Difficulty level
Unknown (require assessment)
Version
1.3.0-rc4
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Event Timeline

rherold created this object in space S1 VyOS Public.
Viacheslav changed the subtype of this task from "Task" to "Bug".Jun 28 2021, 5:56 PM
Viacheslav added a project: VyOS 1.3 Equuleus.

Hi ruben,

I was doing different test with our lab environment , I trying to isolate the issues , It can just be reproduce with a simple nat.(on vrf XX) ,the topology used:

 nat 
 |
RT-VYOS---- vrf-OOBM 
 |                   \
 |                    \
default  -------     switch

I found some parameters that may be limiting the nat translation:

By default the scope of the port bindings for unbound sockets is
 limited to the default VRF. That is, it will not be matched by packets
 arriving on interfaces enslaved to an l3mdev and processes may bind to
 the same port if they bind to an l3mdev.

TCP & UDP services running in the default VRF context (ie., not bound
to any VRF device) can work across all VRF domains by enabling the
tcp_l3mdev_accept and udp_l3mdev_accept sysctl options:

  sysctl -w net.ipv4.tcp_l3mdev_accept=1
  sysctl -w net.ipv4.udp_l3mdev_accept=1

These options are disabled by default so that a socket in a VRF is only
selected for packets in that VRF.

however , I can't test this behavior well .So we ''ll try a different topology and use only vrf (without default ) ,

hi,

as I wrote on slack, from my point of view it is a kernel problem. It seems that the conntrack in the kernel detects the packets eben if they come in on an input interface in default and so
the nat code won'T match cause for conntrack the outgoing interface is still eth0 which is in vrf OOBM instead pppoe0.

I would expect to have conntrack entries for each vrf for this flow.

It seems that what I thought is true:

[email protected]:~$ sudo ip vrf exec OOBM telnet 62.104.56.93
in the same time [email protected]:/home/vyos# conntrack -L |grep 62.104.56.93

conntrack v1.4.6 (conntrack-tools): 72 flow entries have been shown.
tcp 6 119 SYN_SENT src=192.168.0.100 dst=62.104.56.93 sport=47704 dport=23 [UNREPLIED] src=62.104.56.93 dst=192.168.0.100 sport=23 dport=47704 mark=0 use=1

I would expect to see two entries. One for vrf OOBM and one fro VRF default.

Hi ruben

I would like to ask you if you can configure the following :

set protocols vrf OOBM static route 0.0.0.0/0 next-hop 192.168.0.1 next-hop-vrf 'OOBM'

after that you can confirm what your behavior was, in my environment it gives the following result :

[email protected]:~$ ping 8.8.8.8 vrf OOBM interface 192.168.0.100
PING 8.8.8.8 (8.8.8.8) from 192.168.0.100 : 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=114 time=17.8 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=114 time=18.9 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=114 time=16.6 ms
^V64 bytes from 8.8.8.8: icmp_seq=4 ttl=114 time=19.2 ms
64 bytes from 8.8.8.8: icmp_seq=5 ttl=114 time=16.8 ms
^C
--- 8.8.8.8 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 12ms
rtt min/avg/max/mdev = 16.627/17.860/19.206/1.072 ms
[email protected]:~$ conntrack -L

tcp      6 230 ESTABLISHED src=192.168.125.1 dst=192.168.125.61 sport=60444 dport=22 src=192.168.0.100 dst=192.168.125.1 sport=22 dport=60444 [ASSURED] mark=0 use=1  ////////////////// dnat

tcp      6 230 ESTABLISHED src=192.168.125.1 dst=192.168.0.100 sport=60444 dport=22 src=192.168.0.100 dst=192.168.125.1 sport=22 dport=29075 [ASSURED] mark=0 use=1

icmp     1 10 src=192.168.0.100 dst=8.8.8.8 type=8 code=0 id=3431 src=8.8.8.8 dst=192.168.0.100 type=0 code=0 id=3431 mark=0 use=1 ////// ICMP

It seems 1.4-rolling has this bug also
i setup vrf wg with all wireguard clients (with private ip)
and setup vrf leak to vrf default
NAT didn't work on it.
it will send un-NAT packet to eth0

@zsdc
please take a look on this
it might be some similar issue in this patch?
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=0fb4d21956f4a9af225594a46857ccf29bd747bc

because PREROUTING will be called twice

Hi @tj2852847

thanks for your comment , we are testing first with @rherold , I understand that your case is similar but it's not the same (you have an explicit route-leaking between default vrf and vrf X ). So we also need to test it and try to sure the version solved it .

Please take a look at the commit 9213ce6672582bc12f02c1530726fe97030d2cfe for kernel 5.13.

Hello Everyone,
I am testing 1.4 vrf leak from vrf x to default with NAT and is not working as expected. Outbound traffic is get forwarded to gateway NAT applied, but REPLY never forwarded to originator .

Version

[email protected]:~$ show vers

Version:          VyOS 1.4-rolling-202109191513
Release train:    sagitta

Built by:         [email protected]
Built on:         Sun 19 Sep 2021 15:13 UTC
Build UUID:       6837bfa3-73ca-4621-abca-522358e9eec3
Build commit ID:  07555c06452524

Architecture:     x86_64
Boot via:         installed image
System type:      KVM guest

Hardware vendor:  QEMU
Hardware model:   Standard PC (i440FX + PIIX, 1996)
Hardware S/N:     
Hardware UUID:    43015507-c1f4-4857-9139-f3cb2e0d3597

Copyright:        VyOS maintainers and contributors

Traffic

10.41.100.139 is IP of outbound interface toward default gateway

15:50:55.850337 br10.255 In  IP (tos 0x0, ttl 62, id 39811, offset 0, flags [none], proto ICMP (1), length 84)
    72.15.151.138 > 8.8.8.8: ICMP echo request, id 42240, seq 4, length 64
15:50:55.850372 eth2  Out IP (tos 0x0, ttl 61, id 39811, offset 0, flags [none], proto ICMP (1), length 84)
    **10.41.100.139** > 8.8.8.8: ICMP echo request, id 42240, seq 4, length 64
15:50:55.863155 eth2  In  IP (tos 0x0, ttl 115, id 0, offset 0, flags [none], proto ICMP (1), length 84)
    8.8.8.8 > 10.41.100.139: ICMP echo reply, id 42240, seq 4, length 64

Route table default

[email protected]:~$ show ip route 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

S>* 0.0.0.0/0 [1/0] via 10.41.100.1, eth2, weight 1, 19:53:50
S>r 9.9.9.1/32 [1/0] via 192.0.2.1 (recursive), weight 1, 01:17:03
  r                    via 192.0.2.1, br10.255 onlink, weight 1, 01:17:03
C>* 10.41.100.0/24 is directly connected, eth2, 01w4d13h
S>* 72.15.151.136/29 [1/0] is directly connected, br10.255 (vrf OVERLAY), weight 1, 01:17:03
S>* 192.0.2.0/24 [1/0] is directly connected, br10.255 (vrf OVERLAY), weight 1, 01:17:03
[email protected]:~$

Route table vrf X

[email protected]:~$ show ip route vrf OVERLAY 
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

VRF OVERLAY:
S>* 0.0.0.0/0 [1/0] via 10.41.100.1, eth2 (vrf default), weight 1, 01:12:13
K * 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 01w1d22h
S>* 10.41.100.1/32 [1/0] is directly connected, eth2 (vrf default), weight 1, 01:12:13
C>* 192.0.2.0/24 is directly connected, br10.255, 01w1d14h
[email protected]:~$

yes, It is an issues related with the conntrack+ nat/vrf leak , I share something where the problem is clearer :

https://serverfault.com/questions/1073012/conntrack-failed-to-nat-its-own-tcp-packets-from-another-vrf

The question how to disable connection tracking.

Is any work around for this scenario ?

not yet , we 've been trying with different CT but it's not solve the main problem . I understand that disabling conntrack is not possible because is used for nat.

syncer changed the task status from Open to In progress.Oct 17 2021, 2:56 PM
syncer triaged this task as High priority.

Today I tested VRF route leaking and NAT. It works on 1.3.1-S1. Simple configuration:

set vrf name red table '1200'

set interfaces ethernet eth0 address 'dhcp'
set interfaces ethernet eth1 address '100.64.0.1/24'
set interfaces ethernet eth1 vrf 'red'

set nat source rule 10 outbound-interface 'eth0'
set nat source rule 10 source address '100.64.0.0/24'
set nat source rule 10 translation address 'masquerade'

set protocols static interface-route 100.64.0.0/24 next-hop-interface eth1 next-hop-vrf 'red'
set protocols vrf red static route 0.0.0.0/0 next-hop 192.168.255.1 next-hop-vrf 'default'

I've re-tested this issues with the initial configuration, nat source / mesquered/ destination , it seems to work as @Dmitry said. The conntrack doesn't show the connection as [UNREPLIED] , it's established :

#configuration 

set interfaces ethernet eth1 address '192.168.0.100/24'
set interfaces ethernet eth1 hw-id '50:00:00:09:00:01'
set interfaces ethernet eth1 vrf 'OOBM'
set interfaces ethernet eth3 address '192.168.0.1/24'
set interfaces loopback lo
set nat destination rule 110 description 'NAT test- INSIDE'
set nat destination rule 110 destination port '2022'
set nat destination rule 110 inbound-interface 'eth0'
set nat destination rule 110 protocol 'tcp'
set nat destination rule 110 translation address '192.168.0.40'
set nat source rule 100 outbound-interface 'eth0'
set nat source rule 100 protocol 'all'
set nat source rule 100 source address '192.168.0.0/24'
set nat source rule 100 translation address 'masquerade'
set protocols vrf OOBM static route 0.0.0.0/0 next-hop 192.168.122.1 next-hop-vrf 'default'

egress traffic :

[email protected]:~$ traceroute 8.8.8.8
traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
 1  192.168.0.100 (192.168.0.100)  7.225 ms  6.194 ms  6.321 ms
 2  192.168.122.1 (192.168.122.1)  13.563 ms  12.694 ms  11.442 ms
 3  * * *
 4  200.51.241.1 (200.51.241.1)  49.754 ms  48.548 ms  44.612 ms
 5  74.125.32.151 (74.125.32.151)  44.074 ms 72.14.208.91 (72.14.208.91)  43.949 ms  43.823 ms
 6  74.125.52.126 (74.125.52.126)  43.683 ms 74.125.51.138 (74.125.51.138)  20.603 ms 74.125.52.126 (74.125.52.126)  20.230 ms
 7  74.125.242.193 (74.125.242.193)  28.352 ms 172.253.53.33 (172.253.53.33)  27.843 ms 74.125.242.193 (74.125.242.193)  27.035 ms
 8  142.251.239.165 (142.251.239.165)  26.905 ms 142.251.79.143 (142.251.79.143)  26.180 ms 142.250.46.111 (142.250.46.111)  25.734 ms
 9  8.8.8.8 (8.8.8.8)  30.609 ms  28.951 ms  24.783 ms
[email protected]:~$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=117 time=12.0 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=117 time=10.7 ms

conntrack reply ingress connection behind nat/vrf:

[email protected]:~$  conntrack -L
tcp      6 431988 ESTABLISHED src=192.168.122.49 dst=192.168.122.151 sport=44230 dport=2022 src=192.168.0.40 dst=192.168.122.49 sport=2022 dport=44230 [ASSURED] mark=0 use=1



[email protected]:~$ conntrack -E
    [NEW] tcp      6 120 SYN_SENT src=192.168.122.49 dst=192.168.122.151 sport=46156 dport=2022 [UNREPLIED] src=192.168.0.40 dst=192.168.122.49 sport=2022 dport=46156
 [UPDATE] tcp      6 60 SYN_RECV src=192.168.122.49 dst=192.168.122.151 sport=46156 dport=2022 src=192.168.0.40 dst=192.168.122.49 sport=2022 dport=46156
 [UPDATE] tcp      6 299 ESTABLISHED src=192.168.122.49 dst=192.168.122.151 sport=46156 dport=2022 src=192.168.0.40 dst=192.168.122.49 sport=2022 dport=46156 [ASSURED]
 6 60 SYN_RECV src=192.168.122.49 dst=192.168.122.151 sport=46156 dport=2022 src=192.168.0.40 dst=192.168.122.49 sport=2022 dport=46156
 [UPDATE] tcp      6 299 ESTABLISHED src=192.168.122.49 dst=192.168.122.151 sport=46156 dport=2022 src=192.168.0.40 dst=192.168.122.49 sport=2022 dport=46156 [ASSURED^Cconntrack v1.4.6 (conntrack-to.

Could you try with @rherold ? it should work using 1.3.1-S1

Hi, but one more thing related NAT and VRF in 1.4 rolling. As you know it uses NF MAP, to isolate conntrack tables, so we need to create some design to fix this moment. Matbe with adding some mark

I have NAT working with vrf in VyOS 1.4-rolling-202208290458 + custom nat offload

set interfaces ethernet eth0 address '192.168.122.14/24'
set interfaces ethernet eth1 address '192.0.2.1/24'
set interfaces ethernet eth1 vrf 'foo'
set protocols static route 192.0.2.0/24 interface eth1 vrf 'foo'
set system conntrack
set vrf name foo protocols static route 0.0.0.0/0 next-hop 192.168.122.1 interface 'eth0'
set vrf name foo protocols static route 0.0.0.0/0 next-hop 192.168.122.1 vrf 'default'
set vrf name foo table '1010'

Nftables

[email protected]:/home/vyos# cat nat.nft 
flush ruleset

table ip filter {
	flowtable fastnat {
		hook ingress priority filter
		devices = { eth0, eth1 }
	}

	chain forward {
		type filter hook forward priority filter; policy accept;
		ip protocol { tcp, udp } flow add @fastnat
	}
}
table ip nat {
	chain POSTROUTING {
		type nat hook postrouting priority srcnat; policy accept;
		ip saddr 192.0.2.0/24 oif "eth0" snat to 192.168.122.14 persistent
	}

	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
	}
}

Conntrack table

[email protected]:~$ sudo conntrack -F
conntrack v1.4.6 (conntrack-tools): connection tracking table has been emptied.
[email protected]:~$ 
[email protected]:~$ sudo conntrack -L
tcp      6 431999 ESTABLISHED src=192.168.122.14 dst=192.168.122.1 sport=22 dport=44462 src=192.168.122.1 dst=192.168.122.14 sport=44462 dport=22 [ASSURED] mark=0 use=1
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=33018 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=33018 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=37517 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=37517 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=59794 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=59794 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=39288 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=39288 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=39616 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=39616 [OFFLOAD] mark=0 use=2
icmp     1 29 src=192.0.2.2 dst=1.1.1.1 type=8 code=0 id=12387 src=1.1.1.1 dst=192.168.122.14 type=0 code=0 id=12387 mark=0 use=1
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=41155 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=41155 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=39829 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=39829 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=33655 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=33655 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=44835 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=44835 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=40213 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=40213 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=33729 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=33729 [OFFLOAD] mark=0 use=2
udp      17 src=192.0.2.2 dst=1.1.1.1 sport=48344 dport=53 src=1.1.1.1 dst=192.168.122.14 sport=53 dport=48344 [OFFLOAD] mark=0 use=2
conntrack v1.4.6 (conntrack-tools): 14 flow entries have been shown.
[email protected]:~$

as I remember ... it has been working by this PR:

https://phabricator.vyos.net/rVYOSONEX22791e26f444766dc9f9e1729b72893208f58079

but I am not sure , if a kernel update fixed it because as I understood it was a well-known issue on contrackd .

This comment was removed by Viacheslav.

Is there a way to isolate a NAT rule to operate within a VRF?

For example, let's say I have the following configuration:

set vrf name red table '101'
set vrf name blue table '102'

set interfaces ethernet eth0 vif 101 vrf red
set interfaces ethernet eth0 vif 101 address 100.64.0.2/30

set interfaces ethernet eth0 vif 102 vrf blue
set interfaces ethernet eth0 vif 102 address 100.64.0.6/30

set interfaces ethernet eth1 vif 101 vrf red
set interfaces ethernet eth1 vif 101 address 192.168.0.1/24

set interfaces ethernet eth1 vif 102 vrf blue
set interfaces ethernet eth1 vif 102 address 192.168.0.1/24

How can I NAT vrf red source traffic from 192.168.0.0/24 to 100.64.0.2 and vrf blue source traffic from 192.168.0.0/24 to 100.64.0.6?

I'm looking for a command such as

set nat source rule 100 vrf red

The problem here is that the two vrf tables share the client source address of 192.168.0.0/24, but they should have different translated addresses. I realize this is slightly different from the discussion about NATing between VRFs.

At least on my lab, with one of the latest 1.4, this is working for me:

set interfaces ethernet eth0 vif 101 address '100.64.0.2/30'
set interfaces ethernet eth0 vif 101 vrf 'red'
set interfaces ethernet eth0 vif 102 address '100.64.0.6/30'
set interfaces ethernet eth0 vif 102 vrf 'blue'

set interfaces ethernet eth1 vif 101 address '192.168.0.1/24'
set interfaces ethernet eth1 vif 101 vrf 'red'
set interfaces ethernet eth1 vif 102 address '192.168.0.1/24'
set interfaces ethernet eth1 vif 102 vrf 'blue'

set vrf name blue protocols static route 0.0.0.0/0 next-hop 100.64.0.5
set vrf name blue table '102'
set vrf name red protocols static route 0.0.0.0/0 next-hop 100.64.0.1
set vrf name red table '101'

set nat source rule 10 outbound-interface 'eth0.101'
set nat source rule 10 translation address 'masquerade'
set nat source rule 20 outbound-interface 'eth0.102'
set nat source rule 20 translation address 'masquerade'

Then, pinging from two host, one on each vrf (both with ip 192.168.0.X), I can see correct translation address on remote router. From host on vrf red ping to 1.1.1.1, and from host in vrf blue ping to 8.8.8.8

tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
14:17:22.675855 IP 100.64.0.2 > 1.1.1.1: ICMP echo request, id 62190, seq 313, length 64
14:17:22.700520 IP 1.1.1.1 > 100.64.0.2: ICMP echo reply, id 62190, seq 313, length 64
14:17:23.294866 IP 100.64.0.6 > 8.8.8.8: ICMP echo request, id 62446, seq 312, length 64
14:17:23.317557 IP 8.8.8.8 > 100.64.0.6: ICMP echo reply, id 62446, seq 312, length 64