It’s not DNS.
There’s no way it’s DNS.
…
It was DNS.
Took a nasty hit yesterday from a change made in UniFi 5.11.50:
If you have any ‘service dns forwarding options’ configuration defined in config.gateway.json, it will overwrite the provisioning of statically defined name servers, leaving you with no DNS. Either remove the ‘service dns forwarding options’ portion of config.gateway.json, or add additional ‘options’ lines defining name servers, such as ‘server=1.1.1.1’, ‘server=8.8.8.8’, etc.
I’ve long used some config.gateway.json
tweaks to add conditional forwarders for my Active Directory zones because Ubiquiti still hasn’t seen fit to put that functionality in the GUI.
So after upgrading to 5.11.50, the first full re-provision cycle of my USG Pro ended up with no catchall forwarder. The interesting bit is how this manifested itself. My client devices are pointed directly at a local PiHole so that its metrics don’t show everything as coming from the router, so they all kept humming.
What broke was failover load-balancing. By default, the USG’s WAN health check pings a target… by DNS name. So the USG thought both my WANs had failed, and due to the vagaries of how the routing tables are managed, that ends up sending all traffic out WAN2. Not a desirable condition when using WAN2 costs me money (LTE) and is significantly slower than WAN1.
This led to a couple hours of cursing at how badly failover load-balancing is messed up on Ubiquiti’s routers, but in the end, as it is so often, the problem was DNS.
More interesting is that this problem was brought about by some behind-the-scenes changes in how UniFi is managing DNS for the gateway device. Prior to this version of UniFi, the USG defaults to putting the WAN DNS value in /etc/resolv.conf
. This is not ideal for two reasons:
- No DNS caching takes place for things running on the router itself. If you have IPS or load-balancing enabled this can result in an excessive number of DNS queries originating from the router, and creates a ton of noise if you have DNS metrics through PiHole, OpenDNS, etc.
- DNS servers are always queried one-by-one in the order they appear. This may or may not be desirable, but when the first DNS server is unavailable it will result in slow DNS responses.
I’ve worked around this on my USGs by configuring the WAN DNS servers manually and making the first entry 127.0.0.1. So the router will query the local dnsmasq
first, and dnsmasq
itself will ignore that entry when selecting an upstream DNS to forward to. dnsmasq
is also “sticky” by default in choosing an upstream DNS server — it won’t keep sending queries to the first resolv.conf
entry if it’s non-responsive.
What UniFi 5.11.50 has changed is that now they’re automatically placing just a 127.0.0.1 entry in resolv.conf
and have moved the rest of the DNS configuration into /etc/dnsmasq.conf
.
Another change they’ve made is setting all-servers
. This causes dnsmasq
to query all upstream DNS servers simultaneously and whichever answers first wins.
This is great, in that the USG’s DNS configuration is now made sensible by default. And it’s not great, in that if you have config.gateway.json
changes to service dns forwarding options
they will override what UniFi is doing to dnsmasq.conf
and you will need to make adjustments to provide the DNS forwarders yourself.
And beware if you’re configuring multiple DNS servers with differing views of DNS! This has always been a bad idea but all-servers
can make it much worse to troubleshoot.