Even with customers routes redistributed by OSPF instead of iBGP, it has just crashed again:
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
All Stories
Aug 31 2020
I tried unit-cache earlier but it seems to have issues too - I've seen duplicate routes if the same client (all have static IP assigned by RADIUS based on username) connects to a different PPPoE server and the old route is not removed, as if the cached (not removed) PPPoE interfaces were not seen as removed in FRR. But I haven't investigated this in more detail as it's a production setup, can't experiment too much on live customers.
I'm considering if I could go back to redistributing PPPoE customers /32 routes in OSPF instead of iBGP - it has been that way for a few years (using MikroTik, before moving to VyOS), but I've recently changed it following "BGP Best Current Practices" http://www.bgp4all.com.au/pfs/_media/workshops/05-bgp-bcp.pdf which recommends using OSPF only for infrastructure, not customers - seems logical to me as BGP was designed for much larger routing tables (all of the Internet), but perhaps OSPF is still good enough for just a few hundreds of customers.
Hello @marekm, I think [ppp]unit-cache=n might help in this case, but the main issue in FRR. Do you want a package for the test with these improvements?
unit-cache=n By default is disabled: unit-cache=0
Aug 30 2020
Resolved in PR536.
Multiple fixes have been placed into the 4.19 series Kernel. Could you please try upgrading to VyOS 1.2.5 or 1.2.6-epa1?
In T563#74302, @c-po wrote:Please use the new get_config_dict() API calls.
Please use the new get_config_dict() API calls.
Squid will be used for authentication and controlling name resolution (pointing to a spacial DNS or so?) , no squidguard or caching will be used anymore. It also ran in transparent mode per default, which requires an iptables rules set. I think that feature can be removed, since a transparent proxy has no authentication options anyway.
I've just had two different routers (one bare metal and one VM) crash roughly at the same time, triggered by many PPPoE sessions disconnecting at the same time due to a short power failure (routers itself had power all the time, but power was interrupted for about a minute to a switch on the network between the routers and PPPoE clients). Stack traces are very similar (absolute addresses differ, but the same functions and offsets in them). And again, each time watchfrr restarted bgpd but it was not working until reboot. No problems so far with two other BGP routers running a similar configu but without any dynamic interfaces (only OSPF and BGP, no PPPoE servers).
I tested this in LAB and it seems works properly. Changing interface name for eth1 and eth2
vyos@vyos# delete interfaces ethernet eth1 hw-id [edit] vyos@vyos# delete interfaces ethernet eth2 hw-id [edit] vyos@vyos# set interfaces ethernet eth1 hw-id 50:01:00:02:00:02 [edit] vyos@vyos# set interfaces ethernet eth2 hw-id 50:01:00:02:00:01 [edit] vyos@vyos# commit [edit] vyos@vyos# save Saving configuration to '/config/config.boot'... Done [edit] vyos@vyos# run reboot now
After reboot
vyos@vyos:~$ sudo ethtool -P eth1 Permanent address: 50:01:00:02:00:02 vyos@vyos:~$ sudo ethtool -P eth2 Permanent address: 50:01:00:02:00:01
@maznu , can you provide next:
show configuration commands | match hw-id sudo cat /run/udev/log/vyatta-net-name.coldplug
@c-po https://github.com/vyos/vyos-build/pull/121 will fix it, but I used .142 while the conifg file was from 136, so please review first. I tested it and the system speaker is fully functional again.
You can test it quickly via `echo -ne "\a"', which should make noise. Beep seems to be broken, looks like it can't be used via sudo, something I may can have a look later into.
cheers
Aug 29 2020
echo -ne "\a" should give you a beep sound on the the system speaker too, if you just want to quickly test it. I tested it with deb10 minimal install, works via qemu too.
e.g: qemu-system-x86_64 -smp cpus=3 -soundhw pcspk -m 1024 -enable-kvm -drive file=os.img,media=disk (os disk is a deb10 netinstall).
With capabilities I meant the listed capabilities listed under the input link via sys:
Any news on this one? Have posted some of the pain I've been having in T291 where VyOS is neither behaving as per documentation (match on hw-id) nor consistently across reboots.
Yup, the same situation: It's there on LTS but not on nightly. Also nonexistent functionality on frr, so it can go from the completions too. (Although the CLI files are still in the vyatta-op-quagga repo.)
However it still seems to be stuck in reset on nightly:
$ reset ip bgp foo rsclient % Unknown command: clear ip bgp foo rsclient
We look forward to the solution of this problem
Neither does VyOS have predicable interface names, nor does it behave as per VyOS' documentation.
Is this problem solved at present?
According to documentation — https://wiki.vyos.net/wiki/Troubleshooting — specifying the hw-id of an interface should be tell udev (or similar) to ensure that the interface with the MAC-address specified gets the name of e.g. eth0.
I have removed show ip bgp scan for the upcoming 1.2.6 release.
Viacheslav seems to have migrated a good portion of Quagga show commands to the XML template format a while ago and the remaining completion file for ip bgp scan also got deleted then. I can see that the useless command completion for it is still there on 1.2.5 LTS but it's gone from the nightly builds.
As far as I recall it doesn't initializes is correctly anymore, you can test with beep. The system beep you can set via cli is broken since then.
Aug 28 2020
@Viacheslav --verbose is not the issue, it's only used to output the actual "error".
This is no longer compiled as a module but rather statically into the kernel (https://github.com/vyos/vyos-build/blob/current/packages/linux-kernel/x86_64_vyos_defconfig#L188)
I do not have that hardware available, but a possible solution could be the following snippet which could be run on system boot:
We no longer make use of git submodules. Closing as wontfix. Build from source is possible using e.g. our Jenkins CI/CD pipeline library (https://github.com/vyos/vyos-build/tree/current/vars)
@Viacheslav what you think?
This is not a "bug" - we pass any argument to show ip bgp down to FRR.
Could not reproduce this on a clean install. That error is printed iff the IPsec process is not running, obviously enough. When I set up IPsec and ran the command, it simply printed out the table. Maybe there was a hiccup with the nightly build you used if that error happened with a running IPsec process.
Fixing up the code, but it will suffer the same issue as in T2835. That build file should be the last thing in the build process, otherwise there is no other way to find out what pkg were installed during the build.
This command was removed from Quagga five years ago and never made it to FRR.
It looks like that the build process messed it up, it did create the version file at the beginning of the build, not at the end. After the file usr/share/vyos/version.json was create, pkg installations took place a few minutes alter, that's why everything in the image is newer than the version file, therefore the command output is absolutely correct. I'll check if I can find out what went wrong during the build, since it appears that only 1.2.6 is affected.
In T2820#74102, @Viacheslav wrote:
@marekm
Can you check your BGP configuration if "router-id" is declared?
Also, what is with interface names?
ppp-lot29 ppp-jmg22 ppp-rol81 ppp-rod8
Do you use scripts with renaming? How to reproduce it?
/usr/libexec/vyos/op_mode/version.py:
Built on: Thu 13 Aug 2020 11:57 UTC
Happens also when just using the booted image without install. Investigating.
Aug 27 2020
Dear friend,
any syntax can be suggested, the problem to be solved is the administrator's willingness to open a port, be it tcp, udp, both, only ipv4, or only ipv6.
If this problem can be solved at the administrator's discretion, it would be a great solution.
It crashed again after 5 days in 1.2.6-epa1, in the same function, also when a dynamic PPPoE interface was deleted.
It happens less frequently after the former customers who repeatedly failed authentication have been physically disconnected.
Again, BGP no longer works after watchfrr has restarted the bgpd process. All works again after reboot.
It is not possible to remove a filter on an interface.
Configuration