DNS Changes in UniFi 5.11.50

It’s not DNS.
There’s no way it’s DNS.

It was DNS.

Took a nasty hit yesterday from a change made in UniFi 5.11.50:

If you have any ‘service dns forwarding options’ configuration defined in config.gateway.json, it will overwrite the provisioning of statically defined name servers, leaving you with no DNS. Either remove the ‘service dns forwarding options’ portion of config.gateway.json, or add additional ‘options’ lines defining name servers, such as ‘server=1.1.1.1’, ‘server=8.8.8.8’, etc.

I’ve long used some config.gateway.json tweaks to add conditional forwarders for my Active Directory zones because Ubiquiti still hasn’t seen fit to put that functionality in the GUI.

So after upgrading to 5.11.50, the first full re-provision cycle of my USG Pro ended up with no catchall forwarder. The interesting bit is how this manifested itself. My client devices are pointed directly at a local PiHole so that its metrics don’t show everything as coming from the router, so they all kept humming.

What broke was failover load-balancing. By default, the USG’s WAN health check pings a target… by DNS name. So the USG thought both my WANs had failed, and due to the vagaries of how the routing tables are managed, that ends up sending all traffic out WAN2. Not a desirable condition when using WAN2 costs me money (LTE) and is significantly slower than WAN1.

This led to a couple hours of cursing at how badly failover load-balancing is messed up on Ubiquiti’s routers, but in the end, as it is so often, the problem was DNS.

More interesting is that this problem was brought about by some behind-the-scenes changes in how UniFi is managing DNS for the gateway device. Prior to this version of UniFi, the USG defaults to putting the WAN DNS value in /etc/resolv.conf. This is not ideal for two reasons:

  1. No DNS caching takes place for things running on the router itself. If you have IPS or load-balancing enabled this can result in an excessive number of DNS queries originating from the router, and creates a ton of noise if you have DNS metrics through PiHole, OpenDNS, etc.
  2. DNS servers are always queried one-by-one in the order they appear. This may or may not be desirable, but when the first DNS server is unavailable it will result in slow DNS responses.

I’ve worked around this on my USGs by configuring the WAN DNS servers manually and making the first entry 127.0.0.1. So the router will query the local dnsmasq first, and dnsmasq itself will ignore that entry when selecting an upstream DNS to forward to. dnsmasq is also “sticky” by default in choosing an upstream DNS server — it won’t keep sending queries to the first resolv.conf entry if it’s non-responsive.

What UniFi 5.11.50 has changed is that now they’re automatically placing just a 127.0.0.1 entry in resolv.conf and have moved the rest of the DNS configuration into /etc/dnsmasq.conf.

Another change they’ve made is setting all-servers. This causes dnsmasq to query all upstream DNS servers simultaneously and whichever answers first wins.

This is great, in that the USG’s DNS configuration is now made sensible by default. And it’s not great, in that if you have config.gateway.json changes to service​ dns forwarding options they will override what UniFi is doing to dnsmasq.conf and you will need to make adjustments to provide the DNS forwarders yourself.

And beware if you’re configuring multiple DNS servers with differing views of DNS! This has always been a bad idea but all-servers can make it much worse to troubleshoot.

Ring and Retry

I have a Ring Doorbell Pro on my front door which has always been problematic. At first I could get it to join the WiFi but then it would error — turns out it was trying to use an outside DNS server and I had blocked clients from using any DNS but mine.

When I replaced my temporary AmpliFi setup with UniFi, I couldn’t get it to find my SSID at all. I literally held an AP directly in front of it and it would find several neighbors WiFi but not mine.

img_3400

I read somewhere that sometimes a Ring will get confused seeing several APs broadcasting the same SSID, so I decided to give it its own AP and SSID on 2.4GHz-only. I put it in the attic slightly offset from being directly above the door. This has mostly worked ok, except that it takes a long time to re-connect after the AP reboots from firmware updates.

Today I had cause to look at my AP retry rates…

Screenshot 2018-11-07 at 12.23.20 PM

And the Ring’s AP retry rate is just ridiculously bad. I crawled into the attic and shoved the AP all the way into the soffit. Gained 6dBm but the retry rate didn’t budge. Changed from Channel 1 to Channel 11, no difference.

Then I had the thought that, since initially installing this stuff, I’ve put a great deal of effort into tuning the power levels and minRSSI values to get devices to use the right AP instead of clinging to a poor signal. Let’s try turning off an AP in the attic I don’t really need anymore, bring the one I’ve been using for the Ring back into broadcasting my normal SSID on 2.4GHz and 5GHz, and trying joining the Ring to it.

And it joined right up! On 5GHz. The signal is decent and the retry rates have dropped to a more reasonable level. Huzzah!

AD Dynamic DNS Updates from EdgeRouter DHCP

Today was the day that I de-commissioned DHCP on my home Active Directory servers. The one area that gave me a little trouble was figuring out how to get Dynamic DNS for clients working with AD DNS. All of the guidance I could find was for BIND.

Here are the commands I used:

set service dhcp-server use-dnsmasq disable
set service dhcp-server dynamic-dns-update enable true
set service dhcp-server global-parameters 'ddns-updates on;'
set service dhcp-server global-parameters 'update-static-leases on;'
set service dhcp-server shared-network-name LAN shared-network-parameters 'ddns-rev-domainname="in-addr.arpa.";'
set service dhcp-server shared-network-name LAN shared-network-parameters 'ddns-domainname="AD-DOMAIN-NAME.";'

Replace LAN with the name of the DHCP server instance on the EdgeRouter, and AD-DOMAIN-NAME with your AD domain (note the trailing period). The " are necessary to escape the quotation marks within the CLI — make sure to copy those as-is.

Breaking this down step-by-step:

set service dhcp-server use-dnsmasq disable

This configures the ER to use ISC’s DHCPd instead of dnsmasq.

set service dhcp-server dynamic-dns-update enable true
set service dhcp-server global-parameters 'ddns-updates on;'
set service dhcp-server global-parameters 'update-static-leases on;'

I’m not sure the first one is necessary here, but we’re configuring DHCP to perform DNS updates on clients’ behalf and to include static DHCP clients.

set service dhcp-server shared-network-name LAN shared-network-parameters 'ddns-rev-domainname="in-addr.arpa.";'
set service dhcp-server shared-network-name LAN shared-network-parameters 'ddns-domainname="AD-DOMAIN-NAME.";'

Finally, we configure each DHCP scope for updates to the forward and reverse zones.

Dynamic DNS with Failover Load-Balancing

Here’s the scenario: Dual WANs configured for failover load-balancing. Firewall / DNAT rules in place allowing either interface to be used for incoming connections, and Dynamic DNS is configured on eth6 for wan1.domain.com and eth7 for wan2.domain.com.

Problem: Want to have active-wan.domain.com resolve to whichever WAN is active. Can’t use web-check on the eth6 / eth7 interfaces because the load-balancing policies apply to traffic originating from the router so wan1 and wan2 would always be set the active interface’s IP.

Solution: At first I thought it would be necessary to create a transition-script for the load-balancing policy to update active-wan outside of ddclient during a transition, but I realized that I was over thinking the problem.

What I ended up doing was creating an additional DDNS entry on a separate interface for domain.com which uses a web-check.

set service dns dynamic interface eth5 web dyndns
set service dns dynamic interface eth5 service custom-lb host-name active-wan.domain.com
set service dns dynamic interface eth5 service custom-lb login user
set service dns dynamic interface eth5 service custom-lb options zone=domain.com
set service dns dynamic interface eth5 service custom-lb password pass
set service dns dynamic interface eth5 service custom-lb protocol cloudflare
set service dns dynamic interface eth5 service custom-lb server www.cloudflare.com

This exploits the load-balancing of the router’s traffic to discover the correct IP for active-wan. During a failover transition, ddclient will automatically detect that the IP has changed and update active-wan — no transition-script is necessary.

Note: Do not use web-check for weighted load-balancing. It will constantly flap between WAN IPs.

Beware the System Name Server Setting

edgerouter-system-name-server

There’s a fair amount of EdgeRouter guidance suggesting the System Name Server be configured, primarily in conjunction with set interfaces ethernet ethX dhcp-options name-server no-update in order to specify nameservers other than what the upstream ISP provides. OpenDNS, Google’s 8.8.8.8 / 8.8.4.4, etc.

Whatever is entered here gets put at the top of /etc/resolv.conf. DNSmasq will pick up those entries as forwarders. The system itself, however, will use those entries directly for DNS resolution — bypassing DNSmasq and not performing any caching.

For many deployments this doesn’t matter — a typical configuration won’t have the router itself initiating many DNS lookups. There are several situations where this does matter:

  • Multi-WAN connectivity tests if you have not explicitly set an IP-based test target.
  • Dynamic DNS updates.
  • UNMS.

In each of these instances, DNS lookups are performed by the system at intervals of 1 minute or less. If your upstream DNS provides any DNS reporting the DNS lookups for those services will be at the top of your reports.

The way to minimize the number of local DNS lookups that get forwarded is to specify 127.0.0.1 as the system nameserver, so they go through DNSmasq and are routed and cached the same as DNS requests from clients. To configure the upstream forwarders there are two options:

  1. Specify addition nameservers for this option. The local system will always attempt resolution starting with the first entry in /etc/resolv.conf, while DNSmasq will ignore 127.0.0.1 and use the additional entries as forwarders.
  2. Explicitly configure DNSmasq forwarders using set service dns forwarding options server=DNS_Server_IP

I use the second method as it keeps the DNS forwarding options in one section of the config file instead of two.