(This post was inspired by a conference call from tonight, wherein the author had to explain how the following worked, during an overnight major change window.)
I’ve been meaning to do a bunch of simpler articles about basic topics that most networking folks and admin folks should know, but don’t seem to much of the time.
I often joke that a lot of my job is gluing the Internet together, and how I describe what I do for a living to folks that have their eyes glaze over when I say the phrase “network engineer”.
But that’s not the sort of gluing I’m talking about today. Instead, I’m going to talk about something that if we didn’t have, the Internet wouldn’t work the way it does today.
It’s DNS glue records, and understanding it is essential for any DNS administrator who has to spend any time working somewhere where they are totally self-hosted. The thing is, it’s amazing how many DNS administrators don’t understand how this works.
DNS, for those that don’t know, is a hierarchical system. Starting all the way at the top are the root servers. These are what are in your root.hints file. Those are the servers that your DNS server goes to query first to go find out who the heck deals with the next domain down the list (com/net/org, country ccTLD registries, etc). Then, the buck passing continues until the dns server either arrives at the correct server hosting this information, or gets to the last server in the domains, and gets a NXDOMAIN response back. Basic DNS 101 type stuff, right?
This assumes, however, that everything is well known, that is, SOMEONE in the chain has the IP/IPv6 addresses for the next set of servers down the line. As an end-user hosting a domain with some given web provider, you’ll be using their name servers, so you don’t really have to worry about that being complete, you’re just the person on the end, and as long as their nameservers are findable, what do you care, right?
Now assume you’re working somewhere, such as an ASP or cloud provider, and you ARE that hosting company. Chances are, you are also hosting your own DNS, maybe you’ve got your offsite DNS providers or secondary site, like a good hosting provider should, but otherwise you’re totally self-reliant for DNS, nobody is hosting your records. So how the heck does anyone know how to reach your domain?
This is where DNS glue comes in.
Let’s run an example to show how this is supposed to work:
1. A DNS server goes looking for mail.example.com.
2. It has no idea where to find information about example.com, or even the com domain (unlikely, but let’s say the server just booted).
3. It does have a root.hints file. Aha! It can go ask those guys running DNS root! So it picks one of those IP/IPv6 addresses listed there for the root and asks for where the .com domain is. It gets back a list of nameservers to go ask for the information, along with the IP/IPv6 addresses of those nameservers.
The bold text is the part you should be paying attention to. It’s the first DNS glue records handed out. Root has to know enough and pass on enough info about .com for the DNS server asking for it to go and even talk to com.
4. Our intrepid DNS server goes and asks the .com servers, “I’m looking for who handles example.com, can you hook me up?” Like an air traffic controller at a busy ARTCC handing a flight off to a local airport for landing, it responds with “example.com’s nameservers are ns1.example.com, ns2.example.com, contact at 127.0.0.5 and 127.0.0.6 respectively, good day!”
Bold text again! This is the second glue record that’s being passed to the requesting DNS server.
5. DNS server asks ns1 or ns2.example.com for mail.example.com, gets back a response, responds to the client that asked it for the info with the IP address for that hostname. It’s also cached the info for .example.com and .com, knowing that it’ll just go directly to those sources if it needs information the next time around, speeding the process up.
Now, most of the time, step 4 is where everything goes horribly wrong for a given domain. If the glue for .com were screwed up, there’d be a big chunk of the Internet coming to a screeching halt, and there are major steps taken to prevent this from ever happening.
Step 4′s setup is the one handled by the registrant of the domain, if they’re self-hosting their own DNS, instead of making it someone else’s problem. They not only have to tell the .com registry what the nameservers are, they also have to feed that little bit of sticky bootstrappy information, the DNS glue record. Thankfully, most registrars will validate that any given nameserver is resolvable or has its own glue in place before accepting it as a valid nameserver.
Where this goes horribly wrong is when a company moves servers around in their IP addressing space, or changes ISPs and doesn’t have their own address assignment. All too often, because DNS servers don’t tend to change too much (stability is a good thing here, and DNS doesn’t need much in the way of resources to run for most companies), the institutional memory has been lost of what setting up or changing a self-hosted domain requires. Any other domains can just use the example.com nameservers, so we’ll just specify those! The DNS glue, sitting it its dusty virtual jar way back on a top shelf, has been forgotten.
So they update their A records for those servers, and figure that their work is done. It isn’t. After everybody’s DNS caches expire, they’re going to go looking for who the heck is authoritative for example.com again. They’ll get their handy helping of glue from .com, which hands out the old addresses, and this is where the whole process falls flat on its face. The DNS lookup times out, the browser put up “page cannot be displayed” messages, and an angry customer wants to know why the heck their stuff stopped working.
The things that kept a major change from possibly blowing up on us tonight are a combination of the institutional memory having a hazy recollection, specifically, one of the apps guys asking “hey, didn’t we have to tell the registrar something the last time we changed some nameserver addresses?”), just before I asked on the conference call “You did update your glue records, right? Tracing via dig isn’t showing that it’s not showing up with the new info, and it’s been a little bit (glue changes seem to propagate quite quickly for com/net/org).” The mention of the words “glue records” got the attention of the admin handling the changes, after some explanation, they understood what the issue was, and sure enough, they hadn’t done it because they didn’t know they needed to. Five minutes went by after the correction was made, and the new addresses were being properly sent out by the .com servers.
It’s not that I have dumb co-workers, in fact, I work with some really awesome folks, it’s been the best job I’ve had in the ten-plus years I’ve had in the IT industry. It’s just that infrequent changes don’t get remembered well by many people, and sometimes things go wrong. I’ve just been bit by this issue enough times that it’s the first and last thing I check when moving DNS stuff around. So what’s the fix to help keep such things from happening? Documentation, documentation, documentation. When the domain is set up in a self-hosted configuration, note what it took to get that set up, and put that on the internal wiki, a three ring binder,a PDF, SOMETHING. You do have a documentation share that all your folks use, right? Put it there, in your procedures directory (you have one of those too, yes?).
The moral of this story is, if you make radical changes to how you host you DNS, always check your DNS glue and document how that got set up in the first place. Your customers will appreciate you for it.