****Hello! First of all I`d like to tell about my topology.
I have a router on a stick configuration in my Company. There is a bonding between Router (VyOS 1.1.6) and a Switch (Cisco 3750), and on the Router i have some vifs.
Also i have 2 ISP and configured load-balancing (and it works fine)
set protocols static route 184.108.40.206/32 next-hop 'ISP1'
set protocols static route 220.127.116.11/32 next-hop 'ISP2'
set load-balancing wan interface-health bond0.82 failure-count '4'
set load-balancing wan interface-health bond0.82 success-count 5
set load-balancing wan interface-health bond0.82 nexthop 'ISP1'
set load-balancing wan interface-health bond0.82 test 10 target '18.104.22.168'
set load-balancing wan interface-health bond0.82 test 10 type 'ping'
set load-balancing wan interface-health bond0.80 failure-count '4'
set load-balancing wan interface-health bond0.80 success-count 5
set load-balancing wan interface-health bond0.80 nexthop 'ISP2'
set load-balancing wan interface-health bond0.80 test 10 target '22.214.171.124'
set load-balancing wan interface-health bond0.80 test 10 type 'ping'
set load-balancing wan 'flush-connections'
set load-balancing wan 'enable-local-traffic'
set load-balancing wan rule 10 destination address '192.168.0.0/16'
set load-balancing wan rule 10 'exclude'
set load-balancing wan rule 10 inbound-interface 'bond0+'
set load-balancing wan rule 11 destination address '172.16.0.0/12'
set load-balancing wan rule 11 'exclude'
set load-balancing wan rule 11 inbound-interface 'bond0+'
set load-balancing wan rule 12 destination address '10.0.0.0/8'
set load-balancing wan rule 12 'exclude'
set load-balancing wan rule 12 inbound-interface 'bond0+'
set load-balancing wan rule 71 destination address '0.0.0.0/0'
set load-balancing wan rule 71 'failover'
set load-balancing wan rule 71 inbound-interface 'bond0.71'
set load-balancing wan rule 71 interface bond0.80 weight '2'
set load-balancing wan rule 71 interface bond0.82 weight '1'
set load-balancing wan rule 71 protocol 'all'
set load-balancing wan rule 72 destination address '0.0.0.0/0'
set load-balancing wan rule 72 'failover'
set load-balancing wan rule 72 inbound-interface 'bond0.72'
set load-balancing wan rule 72 interface bond0.80 weight '1'
set load-balancing wan rule 72 interface bond0.82 weight '2'
set load-balancing wan rule 72 protocol 'all'
But we also take ip addresses from this 2 ISP and have some services on them.
if i add the
set protocols static route 0.0.0.0/0 next-hop ISP1 distance '10'
set protocols static route 0.0.0.0/0 next-hop ISP2 distance '10'
I have a problem with ip addresses form ISP2
for example: if i ping even the ip addr on the router - it goes with lost
seams like one pocket goes back through the ISP1 and second soes through the ISP2
QUESTION: how can i do that the pocket comes to ISP1 ip addr(interface) and goes back through the ISP1 ???
- 2 ISP
ip rule add from "ISP1_network" table 1 prio 210
ip rule add from "ISP2_network" table 2 prio 220
and in case of Router on a Stick configuration
(without this commans i am not able to ping ISP networks from my internal network)
ip rule add from all to 192.168.0.0/16 table main prio 110
ip rule add from all to 172.16.0.0/12 table main prio 120
ip rule add from all to 10.0.0.0/8 table main prio 130
- set load-balancing wan 'enable-local-traffic'
with this command traceroute from my router goes just via ISP2
without this command traceroute woks fine
P.S. Thanks everyone and especially for Pkilla
It`s asymmetric routing.
On vyos before v1.1.0 you can use this:
set load-balancing wan sticky-connections inbound
But in post-3.6 kernels route cache has been removed and this didnt work.
Read this thread http://marc.info/?l=netfilter&m=139376577524375&w=4, and create your own script for enable sticky connections.
I added this command
set load-balancing wan sticky-connections inbound
and have the same problem
- from this web site - http://lg.telia.net/ i send a ping to my Routers interfaces ISP1 and ISP2
- to ISP1 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max/stddev = 414.052/420.581/426.462/5.356 ms
- to ISP2 ping statistics ---
5 packets transmitted, 3 packets received, 40% packet loss
round-trip min/avg/max/stddev = 416.939/417.873/419.690/1.285 ms
with or without this command - i have the same result( 20-40% loss)
- one of the services (voip) on the ISP1 stopped working fine
I can not ping 126.96.36.199
and can not trace 188.8.131.52 it shows the **** after the 1 hop
(on the voip server default gateway is Vyos`s ISP1 ip addr)
so i delete this commant and server is up again)))
what more can i do ???
We need to summon @EwaldvanGeffen (the author of that command). Let me draw a circle and light some candles...
Your upstream routers next-hop-isp1 and next-hop-isp2 are not the same physical router, otherwise it might have issues with it's own sticky-routing back towards you.
It reads to me your box is not very sticky right now, you should however consider the fact that stickyness is a tricky thing. This is my explanation for pre-1.2 I think when it became a kernel option in 3.6 (see daniil), I sadly haven't had time to playfuck with the new stickyness options (yet, I run 116 in production).
When you are process TCP it's easy since conntrack will point you in the right direction, this is what per-connection used to do, UDP traffic is slightly more difficult by it's connection-less nature and ICMP is where it gets difficult. So always consider in your testing using all these three protocols to get an accurate image. SSH into your box, use NTP for UDP and ping obviously for ICMP. Results will vary.
Second issue when doing testing: your start and endpoint will have influence on how stickyness acts. You have your upstream routers, beyond that, vyos itself (two modes: enable-local and without) and your lan. I recommend for satisfaction to test beyond-upstream-routers and lan, because this will demonstrate the best stickyness you can attain (and the bulk of your traffic will experience).
Post your route -n, iptables -t mangle -L -nv, iptables -t nat -L -nv (if relevant), and can someone fill me in on the command for the multiple routing tables + fwmark thingies just to be sure it's clean? Also what version you're running as there have been very relevant changes recently as @dmbaturin mentioned earlier. I'll try to lab your setup this weekend.
It might take a little time but I assure you we'll get there.
What @pkilla could work but it needs to be the right table and I think there are versions where we've added an offset to not interfere with pbr around 116 iirc.
ps. new job, new hardware and i don't know a proper irc client for mac os x yet :(
I haven't been able to reproduce so far. The only difference would be I've used two seperate interfaces instead of 'on-a-stick', I don't think this should matter but will run on-a-stick tomorrow. Is there any NAT involved in your setup?
When you send traffic to your wan-ip-a from internet, are you sure it goes outbound over wan-ip-b towards internet ? Did you tcpdump this? If not, can you run tcpdumps on both wans and demonstrate your test-results?