I would be keen to know what techniques are used. Usually subdomain discovery is done with dns axfr transfer request which leaks the entire dns zone (but this only works on ancient and unpatched nameservers) or with dictionary attacks. There are some other techniques you can check if you look at the source code of amass (open source Golang reconnaissance/security tool), or CT logs. Dns dumpster is one of the tools I used alongside pentest tools (commercial) and amass (oss)
* Apache Nutch - So they're crawling either some part of the root itself or some other websites to find subdomains. Honestly might help to query CommonCrawl too.
* Calidog's Certstream - As you said, you can look at the CT logs
* OpenAI Embeddings - So I guess it also uses LLM to try to generate ones to test too.
* Proprietary Tools - your guess is as good as mine
Probably a common list of subdomains to test against too.
Seems like multiple techniques to try to squeeze out as much info as possible.
Could that later standard be NSEC3? It’s like the easily walkable NSEC, but with hashed names and special flags for opting out of delegation security features. The 3 appears to stand for the number of people that fully understand how it works…
How can one avoid their browsing ending up in the passive DNS logs? For example, is using 1.1.1.1, 8.8.8.8, or 9.9.9.9 (CF, Google, and Quad9, respectively) good or bad in this regard?
For example, where does Spamhaus get their passive DNS data? They write [1] that it comes from "trusted third parties, including hosting companies, enterprises, and ISPs." But that's rather vague. Are CF, Google, and Quad9 some of those "hosting companies, enterprises, and ISPs"?
I am totally fine with my ISP seeing my DNS traffic (it is bound by GDPR & more; I trust it more than CF or Google). I want to ensure the DNS traffic info does not leave my ISP (other than to other DNS resolvers recursively).
And as per Spamhaus, the DNS traffic in a datacenter may still end up in the Spamhaus passive DNS DB.