How Namecheap is preventing thousands from reaching our site

Until yesterday, searchtempest.com had used namecheap.com for both domain registry and DNS. They are one of the least expensive registrars out there that isn’t named GoDaddy, and generally have a good reputation, so this seemed like a reasonable choice. And DNS came free with domain registration, so we didn’t see any need to look elsewhere.

That was until several recent complete outages of their DNS servers. Now we don’t blame namecheap for that. Their business isn’t distributed DNS, and they certainly didn’t DDoS themselves. However, it did demonstrate our need for a more robust solution.

We settled on DNS Made Easy. They appear to provide a very robust, globally distributed, fast, user-friendly, and inexpensive solution. But this post isn’t about them. It’s about what happened when we tried to switch from namecheap’s internal DNS servers to the ones from DNS Made Easy.

The right way to transfer DNS is pretty straightforward, but it’s important that it be followed to avoid apparent downtime. Generally nameserver records (the locations of the nameservers themselves) are cached for 24 hours. So, when you want to change your nameservers without downtime, you just follow these steps:

  1. Configure the new nameservers with all necessary records.
  2. Point the domain at the new nameservers.
  3. Wait 48 hours* for the cache period to expire.
  4. Remove the records from the old nameservers.

*or whatever the TTL of the NS records is

This process is explained pretty succinctly in the first section here, for example:

But pay attention to the fact, that the NS records of your parent DNS servers are usually cached for 48 hours. Thus you should keep your old nameservers online for at least 48 hours after making the changes to your NS records.

The problem is, at namecheap, when we performed step #2, they immediately did #4: removing our records from their DNS. That means anyone who has accessed the site within 48 hours suddenly has a stale cache and is unable to get there again, unless they know to flush their dns, or wait 48 hours. (And if it’s their ISP that cached the DNS info, they have no choice but to wait.)

I immediately contacted Namecheap support, hoping that they could reinstate our records for the remainder of the 24 hour period, but they repeatedly gave me the canned (and incorrect) response that downtime is inevitable with DNS transfers, and I should simply wait 24 hours (apparently oblivious to the fact that a 24 hour outage of a busy website is kind of a big deal, and the fact that their NS records actually had a 48 hour TTL).

Eventually, after two fruitless rounds with namecheap “tech” support, I was able to establish that they should have preserved our records, and that it is in fact their policy to do so for a period of 5 days. However I now couldn’t convince them that this had not, in fact, happened.

Finally, with a bit of help from DNS Made Easy (which appears to have very competent tech support), we figured out the problem. Namecheap has two sets of nameservers, which they call “DNS v1” and “DNS v2”. The problems we had a couple weeks ago were with v2, so we switched to v1 at that point, while we sought out a more permanent solution. However, when we transferred yesterday, they preserved our records on their v2 servers (which we haven’t even been using for weeks!), but not on v1 where they need to be. I was finally able to explain this to the third namecheap tech I spoke to, who told me that the v1 servers are controlled by a separate provider, and there must be a problem on their end. She apparently sent them a ticket.

That was now 13 hours ago, with no resolution. I apologize profusely for the inconvenience users of searchtempest.com are suffering. Hopefully it’s some consolation that I’m at least as frustrated myself. If you’re unable to access www.searchtempest.com, you could try flushing your DNS cache. The easiest way to do that is to restart your computer. If that doesn’t help, unfortunately the only options are to call your ISP and ask them to flush the nameserver records for searchtempest.com from their cache, or to wait until the cache expires – potentially until tomorrow afternoon.

Otherwise, all I can do at this point is warn others to avoid the same pitfall. Go ahead and use namecheap for domain registration, but switch to an external DNS immediately, before your website has traffic. It is easily worth a few bucks a month to avoid these kinds of problems. If you’re already using their DNS services, it should be possible to transfer out without downtime, but make sure you’re on v2 before transferring out. And good luck.

Update:

I just followed up with Namecheap tech support for the fourth time, to ask why our records still haven’t been restored on their partner’s web servers. Unfortunately it sounds like the response they got from their partner was almost identical to the canned response they repeatedly gave me:

When you change nameservers for a domain name, these changes are not accepted instantly all over the world. It may take up to 24 hours (in rare cases more) for local ISPs to update their DNS cache, so that everyone can see your website. Since the caching time varies between ISPs, it takes time for DNS changes to be totally in effect. Unfortunately this process cannot be influenced or sped up because of its automated nature.

Once again ignoring the real problem. We know DNS propagation is not instantaneous. But if they leave the records on the old nameservers until the TTL (time to live) of the old NS (nameserver) records has passed, everyone will still be able to access the site while the propagation takes place. What’s more, according to at least one of the tech support reps I spoke with, that is in fact their policy. It’s becoming clear though that the cache period will have expired long before I will be able to find someone willing and able to make the 10 second change that would fix this problem.