packets leak un-natted
Closed, WontfixPublic

Description

I use vyatta for a a NAT between two different networks. Here is an excerpt of my configuration:

interfaces {

ethernet eth0 {
    address 192.168.11.11/24
    address 192.168.11.9/24
    address 192.168.11.10/24
    address 192.168.11.12/24
    address 192.168.11.13/24
    address 192.168.11.14/24
    description "vlan verso asa"
    duplex auto
    hw-id 00:50:56:9a:7d:ce
    smp_affinity auto
    speed auto
}
ethernet eth1 {
    address 192.168.3.1/24
    duplex auto
    hw-id 00:50:56:9a:2e:6a
    smp_affinity auto
    speed auto
}

...
}
nat {

source {
    rule 10 {
        outbound-interface eth0
        source {
            address 192.168.3.0/24
        }
        translation {
            address 192.168.11.11
        }
    }

...
}
protocols {

static {
    route 0.0.0.0/0 {
        next-hop 192.168.11.1 {
        }
    }

...
service {

dhcp-server {
    disabled false
    shared-network-name ospitito {
        authoritative disable
        subnet 192.168.3.0/24 {
            default-router 192.168.3.1
            dns-server 194.153.187.20
            dns-server 8.8.8.8
            domain-name jacobacci.com
            lease 86400
            start 192.168.3.16 {
                stop 192.168.3.254
            }
        }
    }
}

...

eth0 is connected to a firewall (192.168.11.1), and no other device is connected to that vlan, while a bunch of PCs reside on eth1, ip addresses assigned via DHCP.

normally, connections from 192.168.3.x undergo nat, but sometimes a packet is forwarded to 192.168.11.1 WITHOUT being natted.

issuing a simultaneous
tcpdump -v -i eth0 src net 192.168.3.0/24
tcpdump -v -i eth1 src net 192.168.3.0/24

displays that only a small percentage of packets leak unnatted (and they're refused by the firewall, so that the client retries the connection after a little, and this time they're natted regularly. packets belong to many clients (not a particular one), and I noticed packets fot tcp ports 80 and 443 (while in the mix arriving on eth1, there are often packets for udp port 53

cat /proc/net/ip_conntrack displays that I'm well below any reasonable limit of connections

a tcpdump -e displays that this is not a L2 issue, since teh packets leaking from eth0 displays the mac-address of the vyos box...

a reboot and even a vyos upgrade did not help.

I did not notice the problem in the past, but I can not be sure; I discovered it by change while reviewing the log files for the external firewall.

I'm becoming crazy...

Details

Difficulty level
Normal (likely a few hours)
Version
noticed first on 1.1.5, but after I upgraded to 1.1.7, the issue is still there. Vmware Vm, 64 bit
rpiola created this task.Jun 6 2016, 10:50 AM
syncer triaged this task as Normal priority.Jun 8 2016, 4:52 PM
syncer added a subscriber: syncer.

Hello,
@rpiola can you provide little bit more info about your environment?
do you use vmxnet3 adapters or e1000?
We will definitely need to have tcpdumps to take a look.

syncer added a subscriber: VyOS 1.1.x.

@rpiola, i would like to recommend to test latest beta too, so we can investigate further with fix if it still persists
Thanks!

We use vmxnet3 adapters

I made two captures with
tcpdump -v -w eth1.cap net 192.168.3.0/24 -i eth1
tcpdump -v -w eth0.cap net 192.168.3.0/24 -i eth0
(simultaneously, in two separate windows)
how can I attach them?

I'm going to try the beta as well.

rpiola added a comment.Jun 9 2016, 1:19 PM

I installed 1.2.0beta1 ... it shows the same problem.

I ran tcpdump immediately after the reboot, and in the first 30 seconds I noticed that the percentage of leaked packets was much higher than before (30-50%) , then traffic become more regular (after a few minutes, I could observe some 1-2% of leaked packets), but maybe this is related to packets belonging to tcp connections that were opened before the reboot and still retrying (since it is a vm, the reboot is quite fast)

@rpiola thanks for input!
you can attach files from comment toolbar(cloud with arrow)

@dmbaturin @UnicronNL @EwaldvanGeffen
Any idea what we should perform as next step?

rpiola added a comment.Jun 9 2016, 1:28 PM

I uploaded the two captures.

afics added a subscriber: afics.Jun 9 2016, 1:45 PM

This is normal behaviour. You need to add a firewall rule to only allow established and related connections and another to drop invalid packets.

syncer added a comment.Jun 9 2016, 3:11 PM

Hey @afics, thanks for clarifying!
@rpiola can you retest with proposed configuration changes ?

Actually, that router is supposed not to do any filtering.
anyway, it is unclear to me where do you want me to configure the firewall rules: usually the "allow established and related" rule is configured in input on the outside interface of the firewall (in my case, eth0), while I have problems with packets EXITING eth0, and coming from eth1.
I should allow ANY packet coming from eth1 to exit from eth0, with their address translated (otherwise, how can a client pc, connected to eht1, start a NEW connection to the outside world?

afics added a comment.Jun 10 2016, 7:44 AM

The important part is to discard any packets with conntrack state invalid on the internal interface. What you are seeing occurs because netfilter forwards instead of NATs packages it does not know about. see https://bugzilla.netfilter.org/show_bug.cgi?id=693#c11.

so, the workaround should be adding
name dropinvalid {

default-action accept
rule 10 {
    action drop
    state {
        invalid enable
    }
}

}

as an input on interface eth1 with
set interfaces ethernet eth1 firewall in name dropinvalid
?

afics added a comment.Jun 10 2016, 8:24 AM

Yes, that should do the trick.

I enabled the firewall, and it seems that everything is ok... I no longer see untranslated packets on the outside interface...

afics added a comment.Jun 10 2016, 8:33 AM

I guess we should add this to the user guide.

mdsmds added a subscriber: mdsmds.Jun 10 2016, 12:32 PM

Very interesting reading. Yes, I agree with @rpiola, you should add a notice into nat wiki.

@afics: or, instead of that rule 10, we can set globally:
set firewall state-policy invalid action drop

Right?

afics added a comment.Jun 10 2016, 1:13 PM

@mdsmds Yes, that should work, but if you do that, you force all traffic to be tracked by conntrack, which might not be what you what. Whereas if you apply it only to in on your internal NIC, you don't have to track all traffic, assuming you have multiple (internal) interfaces and you don't NAT all of them.

@afics OK+Clear+Thanks

rps added a subscriber: rps.Sep 15 2016, 10:29 AM

Can we move this to "wontfix". This is the normal behavior of Linux and doing any sort of global drop of invalid state traffic by default is not a realistic change.

syncer closed this task as Wontfix.Sep 15 2016, 11:45 AM
syncer claimed this task.

as per @rps request
marking this as solved

Side note: May we add this behaviour to wiki, so it may save some hours for others

Thanks

rps added a comment.Sep 16 2016, 11:22 AM

I've added a quick note in the SNAT section of the Wiki to explain this. Feel free to edit if it seems unclear or could be worded better.