I don't get what the relevance of the ISP ad page is. Wouldn't it be a similar problem if the any DNS server just cached the NXDOMAIN for too long? Seems to me that the problem is either the ISP's DNS server is using a higher TTL than specified, or the user specifying a higher TTL than necessary?
I think the relevance is that because the ISP is incentivized to offer their ad page instead of the correct setting they're monkeying with the proper operation of DNS + caching. Or, even more likely the service the ISP is using to do ad serving on "missing" domains is doing this on their behalf.
The whole situation is a bizarre and I'm surprised any effect was noticed at all. You had to get unlucky enough that this ISP's recursive resolver cache expired in the 1-2 seconds you sent an NXDOMAIN. And then you have your NXDOMAIN TTL set far enough in the future it causes a problem. One possibility is the ISP ignores TTLs, setting its negative ones higher than the SOA settings and the others lower. I think the more likely scenario is weird caching-- either because of geopolitical boundaries or propagation issues on the service provider's side.
Before doing the switchover they might have lowered the TTL to something like 5s, which greatly increases the chance the TTL in the resolver cache would expire during the switchover. And then the ISP probably did set a longer than normal TTL on the record they inserted.
NXDOMAIN unlike SERVFAIL is cached for whatever the regular TTL is. So yeah, seems like this person is complaining about something that would've gone wrong anyway
No what he's complaining about is that the ad network laden DNS server provided by the company had a longer TTL than what was provided which lead to numerous complaints. I've actually seen this happen before with Google DNS where I've of their servers would randomly choke on our DNS settings because of something obscure we had set. It took us weeks to get things fixed because it was only a very small subset but anyone using a Google DNS would have intermittent problems that whole time. We've also seen local ISPs cache temporary statuses for far, far longer than what the records TTL. This is definitely something that happens.