Hacker Newsnew | past | comments | ask | show | jobs | submit | hughesey's commentslogin

With our massive domain name dataset, it was about time we exposed a way to search for subdomains for any domain name. Hopefully people find this useful!


This was announced originally early last year. It removes the requirement for TLD and nTLD (not ccTLD) operators to have a WHOIS service available, but doesn't mandate they must shut them down.

So far the sunsetting has had little effect with most TLDs still having their WHOIS services online. In reality, I think we'll see a period of time where many TLDs and nTLDs have both WHOIS and RDAP available.

Additionally, since ccTLD's aren't governed by ICANN, many don't even have an RDAP service available. As such, there's going to be a mix of RDAP and WHOIS in use across the entire internet for some time to come.

Disclosure: I run https://viewdns.info/ and have spent many an hour dealing with both WHOIS and RDAP parsing to make sure that our service returns consistent data (via our web interface and API) regardless of the protocol in use.


I think RDAP is going to be adopted by more and more ccTLDs as well. WHOIS is not a particularly well liked protocol (I was at an IETF meeting where ICANN did a presentation on the timeline and people were literally cheering for the demise of WHOIS).

Disclosure: Work in the ccTLD space.


100% agree that there will be more ccTLD operators that will implement RDAP. The sooner we're on a consistent protocol the better!


Self-plug: I run a little mastodon/activity pub bot that monitors DNS RDAP adoption according to the official bootstrap file: https://social.haukeluebbers.de/@stateofrdap

Last post from yesterday:

> As of today 82.25% (1187) of all 1443 Top Level Domains have an authoritative RDAP service declared.

> These TLDs were added:

> .ye


okay


It's funny to see that a lot of services are finally moving from a human-readable / plain text format towards structured protocols right at the point where we can finally have LLMS parse the unstructured protocols :-)


Well you can't really trust an LLM to give you reproducible output every time, you can't even trust it to be faithful to the input data, so that's nice to have a standard format now. And for like a millionth of the computing resources to parse it. Also Whois was barely human-readable, with the fields all over the place, missing or different from one registry to the other. A welcome change that should have come really sooner.


we can't ever have LLMs reliably parse any form of data. You know what can parse it perfectly though? A parser. Which works perfectly, and consistently.


Except that the whole problem of WHOIS, that RDAP is solving, is that a WHOIS response an unstructured plaintext response, that's entirely arbitrarily formatted according to the whims of the TLD manager.

Ever tried to parse WHOIS data? You literally have to write a parser per TLD.

And things get even more stupid when you start talking about WHOIS records for IP ranges. Then you have to write a parser per IP-range delegation — starting at IANA, and working recursively, all the way down to the individual ASN. Where you have no idea how many delegating parties are going to be involved — and so get their own step in the chain, formatted however they wish — for any given IP address. (Ask me how I know.)


> Which works perfectly

... on conformant inputs, when it has no bugs.


On non-conformant inputs, a parser will barf and yell at you, which is exactly what you want.

On non-conformant inputs, there's absolutely no telling what an LLM will do, which is precisely the problem. It might barf, or it might blissfully continue, and even if the input was right you couldn't remotely trust it to regurgitate the input verbatim.

As for bugs, it is at least theoretically possible to write a parser with no bugs, whereas an LLM is fundamentally probabilistic.


Of course we can. Reliability is a spectrum, not a binary state. You can push it up however high you like, and stop somewhere between "we don't care about error rate this low" and "error rate is so low it's unlikely to show in practice".

It's not like this is a new concept. There are plenty of algorithms we've been using for decades that are only statistically correct. A perfect example of this is efficient primality testing, which is probabilistic in nature[0], but you can easily make the probability of error as small as "unlikely to happen before heat death of the universe".

--

[0] - https://en.wikipedia.org/wiki/Primality_test#Probabilistic_t...


There are two problems with this comparison. First, probabilistic prime generation has a mathematically proven lower bound that improves with iteration. There is no comparably robust tuning parameter with an LLM. You can use a different model, you can use a bigger variant of the same model, etc., but these all have empirically determined and contextually sensitive reliability levels that are not otherwise tunable. Second, the prime generation function will always give you an integer, and never an apple, or a bicycle, or a phantasm. LLMs regurgitate and hallucinate, which means that a simple error rate is not the only metric that matters. One must also consider how egregiously wrong and even nonsensical the errors can be.


I think the better statement is that, if, say, you're running the Miller-Rabin test 10 times, you can be confident that an error in one test is uncorrelated with an error in the next test, so it's easy to dial up the accuracy as close to 1 as desired. Whereas with an LLM, correlated errors seem much more likely; if it failed three times parsing the same piece of data, I would have no confidence that the 4th-10th times would have the same accuracy rate as on a fresh piece of data. LLMs seem much more like the Fermat primality test, except that their "Carmichael numbers" are a lot more common.


I compare LLMs to a door with a slot where you put a piece of paper with a request on it and you get something back related to that request. If it's the same every time, great. But it might be different or completely wrong. You don't know what goes on behind the door and measuring the error rate tells you little predictive.


The general point is not that the feature currently exists to dial down the LLM parse error rate, it’s that the abstract argument “we can’t use LLMs because they aren't perfect” isn’t a realistic argument in the first place. You’re probably reading this on hardware that _probably_ shows you the correct text most all of the time but isn’t guaranteed to.


There's no such thing as a perfectly-watertight roof, therefore there's no qualitative difference between fixing the roof and buying a bigger bucket.


Precisely this. People dismiss utility of LLMs because they don't give 100% reliability, without considering the basic facts that:

- LLMs != ChatGPT interface, they don't need to be run in isolation, nor do they need to do everything end-to-end.

- There are no 100% reliable systems - neither technological nor social. Voltages fluctuate, radiation flips bit, humans confabulate just as much if not worse than LLMs, etc.

- We create reliability from unreliable systems.

LLMs aren't some magic unreliability pixie dust that makes everything they touch beyond repair. They're just another system with bounded reliability, and can be worked into larger systems just like anything else, and total reliability can be improved through this.

EDIT: In fact, my example with probabilistic primality tests is bad because those tests are too nice - they let us compute tight bounds on the error rate in advance. LLMs are not like that. But then, a lot of systems we rely in our daily lives also have this property - their reliability is established empirically, i.e. we improve them until they work reliably enough, and then we hope they'll keep on working, and deal with random failures when they occur. So that's nothing new, either.


No, LLMs do not have "bounded reliability". All reliability figures for LLMs are based upon empirical observation in specific contexts using artificial benchmarks. As they say in finance, "past performance is not indicative of future results".

Saying LLMs are no worse than random bit flips is, again, an unjustified comparison. We can control bit errors with ECC, we cannot control the output of an LLM except to shackle it into uselessness.


I said bounded. I didn't say how tight. But all of science is about bounding empirical observations, so this is nothing new - nor is relying on systems with empirically established failure rates, which is a good chunk of what engineering is about.


The number of 9s that can be assigned to these "bounds" currently is zero. They are not even 90% reliable. And there is no straightforward way to get to 90%, never mind 95%, 99%, etc. The sliding scale of reliability you originally presented just does not exist.

Yeah, sure, we can hypothetically engineer a system that tolerates a key step in the process which has, say, a 30% chance of being wrong, including a 10% chance of being dangerously wrong (appears correct but is broken in subtle ways), and a 5% chance of being batshit insane, but why would we? The amount of training, vetting, and supervision of human operators necessary to make a working process here immediately raises the question of whether the machine serves man or the other way around.

The best uses of an LLM are those where engineering levels of precision are neither required nor useful.


I see people hallucinate on HN all the time. We tolerate it. Why should we? We should if the overall inclusion of unreliable things (humans) provide value. The error rate for LLMs doesn't matter. The net value does. So if the value is great enough to tolerate the error rate, we do. We don’t categorically dismiss the technology because it can fail really poorly. We design things all the time which can fail catastrophically. Seriously. So LLMs will appear anywhere where the net value is positive. Maybe you’re taking a more nuanced stance, but I see a lot of “if it can hallucinate even once we can’t use it” rhetoric here. And that’s simply irrational. Even “we can’t use it for important things” is wrong. Doctors are using LLMs today to help collate observed data and suggest diagnoses. Trained professional in the loop mitigates the “terrible failure”. So no I don’t even agree that LLMs shall be relegated to non-important things.


I also think categorically dismissing LLMs is a mistake.

However, an LLM for automated code generation (the context of the thread as I understand it) is basically a dubious-code-copy-paster on steroids. That was already the wrong way to develop code to begin with, automating and accelerating it is not an improvement.

There has never been a single case where I took code from Stack Overflow, which is already a relatively high quality source of such snippets, and didn't have to adapt it in at least some way to work with the code I already had. Heck, I often find rewriting the snippet entirely is better than copying and pasting it. Of course, I also give attribution, both for credit and for referring back to the original in case I made a mistake, the best solution changes in the future, there's context I didn't cover, etc. And in between the problems I solve with other people's help is a whole lot of code I write entirely on my own.

There are many cases of code in the wild being bad, not just from a "readability" or "performance" standpoint, but from a security standpoint. LLMs regurgitate bad code despite also having good code, and even the blog posts explaining what's good and what's bad, in their training corpus! And an LLM never gives attribution, partly because it was designed not to care, and partly because the end result is a synthesis of multiple sources rather than a pure regurgitation. Moreover, LLMs don't have much continuity, so they mix metaphors and naming conventions, they tie things together in absurd ways, etc. The end result is an unmaintainable mess, even if it happens to work.

So no, an LLM is not like a compiler, even though compilers often have their own special brand of crazy magic that isn't necessarily good. Nor is it going to deliver a robust way to turn abstract human thoughts into concrete code. It is still a useful tool, but it's not going to be an automated part of developing quality code. And this is going to be true for any non-coding scenario that requires at least the same level of reliability.


It already is automating parts of developing quality code. You’ll just have to believe me on that one, I guess.


Finance is an excellent analogy. Relying on LLM output is similar to relying on the stock market. You might come out ahead but it's always a gamble and the lower bound is always catastrophic failure.


If you job is to be a referent, to have authority. You absolutely don't want to make any error. Pretty safe isn't enough, you need to be absolutely sure that you control the output.

You only have one job, don't delegate authority.


But isn't using LLM for that really expensive? Seems wasteful.


I wouldn't use LLMs, but if I did, I would try to get the LLM to write parser code instead.

If it can convert from one format to another, then it can generate test cases for the parser. Then hopefully it can use those to iterate on parser code until it passes the tests.

In a sense, asking it to automate the work isn't as straightforward as asking it to do the work. But if the approach does pan out, it might be easier overall since it's probably easier to deploy generated code to production (than deploying LLMs).


My desktop GPU can run small models at 185 tokens a second. Larger models with speculative decoding: 50t/s. With a small, finetuned model as the draft model, no, this won't take much power at all to run inference.

Training, sure, but that's buy once cry once.

Whether this means it's a good idea, I don't think so, but the energy usage for parsing isn't why.


A simple text parser would probably be 10,000,000 times as fast. So the statement that this won't take much power at all, is a bit of an overstatement.


50 tokens per second. Compared to a quick and dirty parser written in python or even a regex? That's going to be many many orders of magnitude slower+costlier.


awk would run millions times faster, not to mention mawk and awka.


In order to make the point that

> energy usage for parsing isn't why

You'll need to provide actual figures and benchmark these against an actual parser.

I've written parsers for larger-scale server stuff. And while I too don't have these benchmarks available, I'll dare to wager quite a lot that a dedicated parser for almost anything will outperform an LLM magnitudes. I won't be suprised if a parser written in rust uses upwards of 10k times less energy than the most efficient LLM setup today. Hell, even a sed/awk/bash monstrosity probably outperforms such an LLM hundreds of times, energy wise.


How many times would you need to parse to get an energy saving on using an lm to parse vs using an llm to write a parser, then using the parser to parse.


It sounds like you need to learn how to program without using a LLM, but even if you used one to write a parser, and it took you 100 requests to do so, you would very quickly get the desired energy savings.

This is the kind of thinking that leads to modern software being slower than software from 30 years ago, even though it is running on hardware that's hundreds of times faster.


People not using The AWK Programming Language as a reference to parse stuff and maybe The C Programming Language with AWKA (AWK to C translator) and a simple CSP library for threading yeilds a disaster on computing.

LLM's are not the solutions, they are the source of big troubles.


> using an llm to write a parser

You're assuming OP needs an LLM to write a parser, since they mentions writing many during their career they probably don't need it ;)


I was thinking more of when a sufficiently advanced device would be able to “decide” the task would be worth using its own capabilities to write some code to tackle the problem rather than brute force.

For small problems it’s not worthwhile, for large problems it is.

It’s similar to choosing to manually do something vs automate it.


I didn't use an LLM back then. But would totally do that today (copilot).

Especially since the parser(s) I wrote were rather straightforward finite state machines with stream handling in front, parallel/async tooling around it, and at the core business logic (domain).

Streaming, job/thread/mutex management, FSM are all solved and clear. And I'm convinced an LLM like copilot is very good at writing code for things that have been solved.

The LLM, however, would get very much in the way in the domain/business layer. Because it hasn't got the statistical body of examples to handle our case.

(Parsers I wrote were a.o.: IBAN, gps-trails, user-defined-calculations (simple math formulas), and a DSL to describe hierarchies. I wrote them in Ruby, PHP, rust and perl.)


It’s not just about the energy usage, but also purchase cost of the GPUs and opportunity cost of not using those GPUs for something more valuable (after you have bought them). Especially if you’re doing this at large scale and not just on a single desktop machine.

Of course you were already saying it’s not a good idea, but I think the above definitely plays a role at scale as well.


You’re right, I could be trying to get Crysis to run at 120 fps.


If you have spare GPU time you could donate it to projects like Folding@Home.


My Atom n270 netbook with mawk and a few lines parsing the files with a simple regex will crush down your GPU+LLM's on both time and power usage.


My assumption is that models are getting cheaper, fast. So you can build now with OpenAI/Anthropic/etc and swap it out for a local or hosted model in a year.

This doesn't work for all use cases but data extraction is pretty safe. Treat it like a database query -- a slow but high availability and relatively cheap call.


While it will become cheaper, it will never be as fast / efficient as 'just' parsing the data the old-fashioned way.

It feels like using AI to do computing things instead of writing code is just like when we moved to relatively inefficient web technology for front-ends, where we needed beefier systems to get the same performance as we used to have, or when cloud computing became a thing and efficiency / speed became a factor of credit card limit instead of code efficiency.

Call me a luddite but I think as software developers we should do better, reduce waste, embrace mechanical sympathy, etc. Using AI to generate some code is fine - it's just the next step in code generators that I've been using throughout all my career IMO. But using AI to do tasks that can also be done 1000x more efficiently, like parsing / processing data, is going in the wrong direction.


I know this particular problem space well. AI is a reasonable solution. WHOIS records are intentionally made to be human readable and not be machine parseable without huge effort because so many people were scraping them. So the same registrar may return records in a huge range of text formats. You can write code to handle them all if you really want to, but if you are not doing it en masse, AI is going to probably be a cheaper solution.

Example: https://github.com/weppos/whois is a very solid library for whois parsing but cannot handle all servers, as they say themselves. That has fifteen + years of work on it.


But.. that’s exactly what this thread is about. RDAP is the future, not WHOIS.


Yes, exactly. Read what I was responding to.


I think you’re both right, and also both are missing the point.

Using LLMs to parse whois data is okay in the meantime (preferably as a last resort!), but structuring the data properly in the first place (i.e. RDAP) is the better solution in the long run.


I’m not missing that point at all. I’m 100% on board.


Requesting that people think before transferring mission critical code into the hands of LLMs is not being a Luddite lol.

Can you imagine how many ridiculous errors we would have if LLMs structured data into protobufs. Or if they compiled software.

It's more than 1000x more wasteful resources wise too. The llm swiss army knife is the Balenciaga all leather garbage bag option for a vast majority of use cases


Still, I wouldn't use an LLM for what's essentially a database query: by their very nature, LLMs will give you the right answer most of the times, but will sometimes return you wrong information. Better stay on a deterministic DB query in this case.


As usual, arguments for LLMs are based on rosy assumptions about future trajectory. How about we talk about data extraction at that point in the future when models are already cheap enough. And in the meantime just assume the future is uncertain, as it obviously is.


deepseek API costs are quite literally pennies per million tokens


Which world would you rather live in: * structured protocols that can be parsed by machines * unstructured protocols that are unreliably parsed by LLMs that require significant power and latency


In addition to ~determative machines and LLMs, what about humans reading the data?



Off topic thank you for runnig viewdns.info. I don't use it regularly, mainly for the occasional WHOIS information lookup and it has always worked perfectly.


Thanks for the kind words and glad it's been useful :).


It's kind of funny some operators have never had it in practice. For example, .es never had a public whois, and need to register with a national ID (and I think with a fixed IP address) to get access to it.


That need for a national ID hasn't been in place for a long time, AFAIK.

I have a .es (my nickname berkes, domain berk.es) for almost 16 years now, and live in the EU, but not in Spain. In the beginning I used a small company that offered services for non-spanish companies to register .es through them (I believe they technically owned the domains?). But today it's just in my local domain registrar without need for an ID.

That .es has no whois has struck me as somewhat of a benefit actually. Back in the days, it kept away a lot of spam from spammers that'd just lift email-addresses off the whois. My .com, .nl and other domains recieve(d) significant more such spam. Let alone phone-number and other personal details delivered over an efficient, decentralized network. Though recent privacy addons(?) have mitigated that a little.


I meant for accessing the whois, not for registering. If you try any type of WHOIS request you'll be replied with a message sending you to nic.es site, where you'll be presented with a captcha if you try to get information about a registered domain.

It's not very well documented, but you can register at a government site using a national ID and they'll open WHOIS access for a fixed IP address, for a maximum of 10 queries a minute. [0]

Context for any of you not used to the .es ccTLDs: Until some years ago, and simplifying a bit, if you wanted to register a .es TLD you had to be an Spanish national or company, and be the legal holder of the domain name you wanted to register (or your name and surnames).

--

  0: https://sede.red.gob.es/es/procedimientos/solicitud-de-acceso-servicio-de-whois-por-el-puerto-43


Usually, the need to use an ID is only for private persons (and usually only if they are nationals). Anyone else should not need that. The general theory is that a nation can only verify data that they themselves have.

Some ccTLD's have rules against registrations by people not located within the country that owns the ccTLD, in which case a valid national id or organization number would be required. From what I can see, .es does not have that requirement.


Se my other comment [0] but I meant for accessing the WHOIS service, not for registering. If you try any type of WHOIS request you'll be replied with a message sending you to nic.es site, where you'll be presented with a captcha if you try to get information about a registered domain.

--

  0: https://news.ycombinator.com/item?id=43392356


Requiring a captcha is not even close to requiring a national ID.


If you read my linked comment, I'm talking about using the WHOIS service [0] ICANN is sunsetting that's been talked about in the post, not about getting domain information in the web.

The only way (that I've found) to use the WHOIS service with the .es ccTLD is whitelisting a fixed IP address with your national ID at a government site [1]. And even then, you're rate limited to 10 queries per minute.

--

  0: https://en.wikipedia.org/wiki/WHOIS
  1: https://sede.red.gob.es/es/procedimientos/solicitud-de-acceso-servicio-de-whois-por-el-puerto-43


For example, .es never had a public whois, and need to register with a national ID (and I think with a fixed IP address) to get access to it.

Is this new? I had an .es domain around 2011, and am not Spanish, or even European.


Se my other comment [0] but I meant for accessing the WHOIS service, not for registering. If you try any type of WHOIS request you'll be replied with a message sending you to nic.es site, where you'll be presented with a captcha if you try to get information about a registered domain.

--

  0: https://news.ycombinator.com/item?id=43392356


You don't need WHOIS to register a domain.


Hey, I've been looking for a tool that can do reverse NS lookup for a nameserver pairs (ie. which domains have nameservers ns1.example.com and ns2.example.com) but all the services out there that I've found can only do one. Is this something you would consider implementing?


Thank you so much for running your service. I've used it for years, and LOVE how functional and useful it is!


Heya,

Founder of https://viewdns.info/ here (used by and mentioned in the article a bit). If anyone is doing this kind of research, feel free to reach out at feedback@viewdns.info as I'm more than happy to extend some free API credits etc!

-Hughesey


Thanks for the awesome service! I wish I had known this, I went to quite a few cybercafes to get some extra IPs XD

I really wish the reverse IPs would hit even when it's not the last IP though! Many more hits would come out of that. Related mentions under: https://ourbigbook.com/cirosantilli/cia-2010-covert-communic...


You can also potentially view the historical DNS A records for the domain to view the pre-Cloudflare IP at http://viewdns.info/iphistory/.


If you're determimed, though, you just null route, or block, etc, everything other than Cloudflare inbound.


For many, many DDoS scenarios this does not work. The spurious packets may saturate an upstream ISP, causing that ISP to unilaterally apply a null route or block for all packets for the targeted origin IP. No CloudFlare packets would arrive at all.

If one is concerned about DDoS, one should work with their ISPs on the plan of action for various scenarios. Finding out their procedures when ones' hair is on fire is not fun.


Well you're behind CloudFlare.

Just change your IP address, and tell CloudFlare the new one.

Sure the DDOSers could find your new IP, but it's not like changing your public DNS, it would be difficult for them to find it.

I don't think your SSL certs would show the new IP on the website in the blogpost very quickly if you changed IP.


It's not so much about changing the IP address, but moving the targeted system out from behind the clogged tube. Changing IP address may or may not do that.


It’s easier than you might think, I used to blackhole anything non-Cloudflare and they offer a list of their IP’s:

https://www.cloudflare.com/ips/


There's some reverse engineered zone files for a lot of cctlds at http://viewdns.info/data/.


So has http://viewdns.info. Free too.


Great resource, thanks.


Or just http://ViewDNS.info/ for a free alternative :)


For those without domaintools commercial accounts... http://viewdns.info/reversewhois/?q=seemaexports3%40gmail.co...


Thank you


Also available at http://viewdns.info/iplocation/. There's an API as well http://viewdns.info/api/.


There's one at http://viewdns.info/ if you're only after common ports.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: