I think what sets Zig apart from other low level languages is how easy it is to navigate a Zig source code. I was discouraged at the time (things probably changed now) when the best source for documentation was "read the Zig's source code". But I was impressed by how easy it was to find what I needed.
while this is due to Zig maintainers' code quality, I think a large contributing factor is the choice of syntax. As an exercise, try navigating a C, C++ and any other language source code without an IDE or LSP. things like:
- "Where did that function come from?"
- "What and where is this type?"
what do you have to do to find that out? due to the flexible ways you can declare things in C, it may take you a lot of steps to find these information. even in search, a variable and a function can share the same prefix due to the return type placement. hence why some people prefer function return types in a separate line.
Even with languages like Rust for example, finding if a type in a function parameters is an enum or struct and finding its definition can require multiple steps like search "enum Foo" or "struct Foo", in Zig i can search "const Foo" and i will immediately know what it is.
while i do hope that C gets defer and constexpr functions in the next standard or maybe better generics or enums, Zig syntax is much better to work with in my opinion.
I totally agree with the author. Sadly, I feel like that's not what the majority of LLM users tend to view LLMs. And it's definitely not what AI companies marketing.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input is what can lead to taking LLM output as factual. If one side of the exchange knows nothing about the subject, the other side can use jargon and even present random facts or lossy facts which can almost guarantee to impress the other side.
> The way to solve this particular problem is to make a correct example available to it.
My question is how much effort would it take to make a correct example available for the LLM before it can output quality and useful data? If the effort I put in is more than what I would get in return, then I feel like it's best to write and reason it myself.
> the user will at least need to know something about the topic beforehand.
I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication. "Provide dosage guidelines for medication [insert here]"
It spit back dosing guidelines that were an order of magnitude wrong (suggested 100mcg instead of 1mg). When I saw 100mcg, I was suspicious and said "I don't think that's right" and it quickly corrected itself and provided the correct dosing guidelines.
These are the kind of innocent errors that can be dangerous if users trust it blindly.
The main challenge is LLMs aren't able to gauge confidence in its answers, so it can't adjust how confidently it communicates information back to you. It's like compressing a photo, and a photographer wrongly saying "here's the best quality image I have!" - do you trust the photographer at their word, or do you challenge him to find a better quality image?
What if you had told it again that you don't think that's right? Would it have stuck to it's guns and went "oh, no, I am right here" or would it have backed down and said "Oh, silly me, you're right, here's the real dosage!" and give you again something wrong?
I do agree that to get the full usage out of an LLM you should have some familiarity with what you're asking about. If you didn't already have a sense of what a dosage is already, why wouldn't 100mcg be the right one?
I replied in the same thread "Are you sure that sounds like a low dose". It stuck to the (correct) recommendation in the 2nd response, but added in a few use cases for higher doses. So seems like it stuck to its guns for the most part.
For things like this, it would definitely be better for it to act more like a search engine and direct me to trustworthy sources for the information rather than try to provide the information directly.
I noticed this recently when I saw someone post with an AI generated map of Europe which was all wrong. I tried the same and asked ChatGPT to generate a map of Ireland and it was wrong too. So then I asked to find me some accurate maps of Ireland and instead of generating it gave me images and links to proper websites.
Will definitely be remembering to put "generate" vs "find" in my prompts depending on what I'm looking for. Not quite sure how you would train the model to know which answer is more suitable.
My mom was looking up church times in the Philippines. Google AI was wrong pretty much every time.
Why is an LLM unable to read a table of church times across a sampling of ~5 Filipino churches?
Google LLM (Gemini??) was clearly finding the correct page. I just grabbed my mom's phone after another bad mass time and clicked on the hyperlink. The LLM was seemingly unable to parse the table at all.
Because google search and llm teams are different, with different incentives. Search is the cash cow they keep squeezing for more cash at the expense of good quality since at least 2018, as revealed in court documents showing they did that on purpose to keep people searching more to have more ads and more revenue. Google AI embedded in search has the same goals, keep you clicking on ads… my guess would be Gemini doesn’t have any of the bad part of enshitification yet… but it will come. If you think hallucinations are bad now, just you wait until tech companies start tuning them up on purpose to get you to make more prompts so they can inject more ads!
> I used ChatGPT 5 over the weekend to double check dosing guidelines for a specific medication.
This use case is bad by several degrees.
Consider an alternative: Using Google to search for it and relying on its AI generated answer. This usage would be bad by one degree less, but still bad.
What about using Google and clicking on one of the top results? Maybe healthline.com? This usage would reduce the badness by one further degree, but still be bad.
I could go on and on, but for this use case, unless it's some generic drug (ibuprofen or something), the only correct use case is going to the manufacturer's web site, ensuring you're looking at the exact same medication (not some newer version or a variant), and looking at the dosage guidelines.
No, not Mayo clinic or any other site (unless it's a pretty generic medicine).
This is just not a good example to highlight the problems of using an LLM. You're likely not that much worse off than using Google.
The compound I was researching was [edit: removed].
Problem is it's not FDA approved, only prescribed by compounding pharmacies off label. Experimental compound with no official guidelines.
The first result on Google for "[edit: removed] dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.
> Experimental compound with no official guidelines.
> The first result on Google for "GHK-Cu dosing guidelines" is a random word document hosted by a Telehealth clinic. Not exactly the most reliable source.
You're making my point even more. When doing off label for an unapproved drug, you probably should not trust anything on the Internet. And if there is a reliable source out there on the Internet, it's very much on you to be able to discern what is and what is not reliable. Who cares that the LLM is wrong, when likely much of the Internet is wrong?
BTW, I'm not advocating that LLMs are good for stuff like this. But a better example would be asking the LLM "In my state, is X taxable?"
The Google AI summary was completely wrong (and the helpful link it used as a reference was correct, and in complete disagreement with the summary). But other than the AI summary being wrong, pretty much every link in the Google search results was correct. This is a good use case for not relying on an LLM: Information that is widely and easily available is wrong in the LLM.
Is your point that I should be smarter and shouldn’t have asked ChatGPT the question?
If that’s your point, understood, but I don’t think you can assume the average ChatGPT user will have such a discerning ability to determine when and when not using a LLM is appropriate.
FWIW I agree with you. But the “you shouldn’t ask ChatGPT that question” is a weak argument if you care about contextualizing and broadening your point beyond me and my specific anecdote.
My point is that if you're trying to demonstrate how unreliable LLMs are, this is a poor example, because the alternatives are almost equally poor.
> If that’s your point, understood, but I don’t think you can assume the average ChatGPT user will have such a discerning ability to determine when and when not using a LLM is appropriate.
I agree that the average user will not, but they also will not have the ability to determine that the answer from the top (few) Google links is invalid as well. All you've shown is the LLM is as bad as Google search results.
Put another way, if you invoke this as a reason one should not rely on LLMs (in general), then it follows one should not rely on Google either (in general).
I think this actually points at a different problem, a problem with LLM users, but only to the extent that it's a problem with people with respect to any questions they have to ask any source they consider an authority at all. No LLM, nor any other source on the Internet, nor any other source off the Internet, can give you reliable dosage guidelines for copper peptides because this is information that is not known to humans. There is some answer to the question of what response you might expect and how that varies by dose, but without the clinical trials ever having been conducted, it's not an answer anyone actually has. Marketing and popular misconceptions about AI lead to people expecting it to be able to conjure facts out of thin air, perhaps reasoning from first principles using its highly honed model of human physiology.
It's an uncomfortable position to be in trying to biohack your way to a more youthful appearance using treatments that have never been studied in human trials, but that's the reality you're facing. Whatever guidelines you manage to find, whether from the telehealth clinic directly, or from a language model that read the Internet and ingested that along with maybe a few other sources, are generally extrapolated from early rodent studies and all that's being extrapolated is an allometric scaling from rat body to human body of the dosage the researchers actually gave to the rats. What effect that actually had, and how that may or may not translate to humans, is not usually a part of the consideration. To at least some extent, it can't be if the compound was never trialed on humans.
You're basically just going with scale up a dosage to human sized that at least didn't kill the rats. Take that and it probably won't kill you. What it might actually do can't be answered, not by doctors, not by an LLM, not by Wikipedia, not by anecdotes from past biohackers who tried it on themselves. This is not a failure of information retrieval or compression. You're just asking for information that is not known to anyone, so no one can give it to you.
If there's a problem here specific to LLMs, it's that they'll generally give you an answer anyway and will not in any way quantify the extent to which it is probably bullshit and why.
> In any other universe, we would be blaming the service rather than the user.
I think the key question is "How is this service being advertised?"
Perhaps the HN crowd gives it a lot of slack because they ignore the advertising. Or if you're like me, aren't even aware of how this is being marketed. We know the limitations, and adapt appropriately.
I guess where we differ is on whether the tool is broken or not (hence your use of the word "fix"). For me, it's not at all broken. What may be broken is the messaging. I don't want them to modify the tool to say "I don't know", because I'm fairly sure if they do that, it will break a number of people's use cases. If they want to put a post-processor that filters stuff before it gets to the user, and give me an option to disable the post-processor, then I'm fine with it. But don't handicap the tool in the name of accuracy!
The point you were making elsewhere in the thread was that "this is a bad use case for LLMs" ... "Don't use LLMs for dosing guidelines." ... "Using dosing guidelines is a bad example for demonstrating how reliable or unreliable LLMs are", etc etc etc.
You're blaming the user for having a bad experience as a result of not using the service "correctly".
I think the tool is absolutely broken, considering all of the people saying dosing guidelines is an "incorrect" use of LLM models. (While I agree it's not a good use, I strongly dislike how you're blaming the user for using it incorrectly - completely out of touch with reality).
We can't just cover up the shortfalls of LLMs by saying things like "Oh sorry, that's not a good use case, you're stupid if you use the tool for that purpose".
I really hope the HN crowd stops making excuses for why it's okay that LLMs don't perform well on tasks it's commonly asked to do.
> But don't handicap the tool in the name of accuracy!
If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.
If LLMs were only used by smarter-than-average HN-crowd techies, I'd agree. But we're talking about a technology used by middle school kids. I don't think it's reasonable to expect middleschoolers to know what they should and shouldn't ask LLMs for help with.
> You're blaming the user for having a bad experience as a result of not using the service "correctly".
Definitely. Just as I used to blame people for misusing search engines in the pre-LLM era. Or for using Wikipedia to get non-factual information. Or for using a library as a place to meet with friends and have lunch (in a non-private area).
If you're going to try to use a knife as a hammer, yes, I will fault you.
I do expect that if someone plans to use a tool, they do own the responsibility of learning how to use it.
> If you're taking the position that it's the user's fault for asking LLMs a question it won't be good at answering, then you can't simultaneously advocate for not censoring the model. If it's the user's responsibility to know how to use ChatGPT "correctly", the tool (at a minimum) should help guide you away from using it in ways it's not intended for.
Documentation, manuals, training videos, etc.
Yes, I am perhaps a greybeard. And while I do like that many modern parts of computing are designed to be easy to use without any training, I am against stating that this is a minimum standard that all tools have to meet.
Software is the only part of engineering where "self-explanatory" seems to be common. You don't buy a board game hoping it will just be self-evident how to play. You don't buy a pressure cooker hoping it will just be safe to use without learning how to use it.
So yes, I do expect users should learn how to use the tools they use.
I don’t think I’ve ever seen ChatGPT 5 refuse to answer any prompt I’ve ever given it. I’m doing 20+ chats a day.
What’s an example prompt where it will say “idk”?
Edit: Just tried a silly one, asking it to tell me about the 8th continent on earth, which doesn’t exist. How difficult is it for the model to just say “sorry, there are only 7 continents”. I think we should expect more from LLMs and stop blaming things on technical limitations. “It’s hard” is getting to be an old excuse considering the amount of money flowing into building these systems.
Here's a recent example of it saying "I don't know" - I asked it to figure out why there was an octopus in a mural about mushrooms: https://chatgpt.com/share/68b8507f-cc90-8006-b9d1-c06a227850... - "I wasn’t able to locate a publicly documented explanation of why Jo Brown (Bernoid) chose to include an octopus amid a mushroom-themed mural."
Not sure what your system prompt is, but asking the exact same prompt word for word for me results in a response talking about "Zealandia, a continent that is 93% submerged underwater."
The 2nd example isn't all that impressive since you're asking it to provide you something very specific. It succeeded in not hallucinating. It didn't succeed at saying "I'm not sure" in the face of ambiguity.
I want the LLM to respond more like a librarian: When they know something for sure, they tell you definitively, otherwise they say "I'm not entirely sure, but I can point you to where you need to look to get the information you need."
Interestingly I tried the same question in a separate ChatGPT account and it gave a similar response you got. Maybe it was pulling context from the (separate) chat thread where it was talking about Zealandia. Which raises another question: once it gets something wrong once, will it just keep reenforcing the inaccuracy in future chats? That could lead to some very suboptimal behavior.
Getting back on topic, I strongly dislike the argument that this is all "user error". These models are on track to be worth a trillion dollars at some point in the future. Let's raise our expectations of them. Fix the models, not the users.
My consistent position on this stuff is that it's actually way harder to use than most people (and the companies marketing it) let on.
I'm not sure if it's getting easier to use over time either. The models are getting "better" but that partly means their error cases are harder to reason about, especially as they become less common.
I gave LLM a list of python packages and asked it to give me their respective licenses. Obviously it got some of them wrong. I had to manually check with the package's pypi page.
his task is probably best done with a script, heck you could thell chatgpt to download all packages with a script and check LICENSE files and report back with a csv/table
"The main challenge is LLMs aren't able to gauge confidence in its answers"
This seems like a very tractable problem. And I think in many cases they can do that. For example, I tried your example with Losartan and it gave the right dosage. Then I said, "I think you're wrong", and it insisted it was right. Then I said, "No, it should be 50g." And it replied, "I need to stop you there". Then went on to correct me again.
I've also seen cases where it has confidence where it shouldn't, but there does seem to be some notion of confidence that does exist.
I need to stop you right there! These machinations are very good at seeming to be! The behavior is random, sometimes it will be in a high dimensional subspace of refusing to change its mind, others it is a complete sycophant with no integrity. To test your hypothesis that it is more confident about some medicines than others (maybe there is more consistent material in the training data...) one might run the same prompt 20 times each with various drugs, and measure how strongly the llm insists it is correct when confronted.
Unrelated, I recently learned the state motto of North Carolina is "To be, rather than to seem"
I tried for a handful of drugs and unfortunately(?) it gave accurate dosages to start with and it wouldn't budge. Going too low and it told me that the impact wouldn't be sufficient. Going too high and it told me how dangerous it was and that I had maybe misunderstood the units of measure.
With search and references, and without search and references are two different tools. They're supposed to be closer to the same thing, but are not. That isn't to say there's a guarantee of correctness with references, but in my experience, accuracy is better, and seeing unexpected references is helpful when confirming.
I don’t disagree that you should use your doctor as your primary source for medical decision making, but I also think this is kind of an unrealistic take. I should also say that I’m not an AI hype bro. I think we’re a long ways off from true functional AGI and robot doctors.
I have good insurance and have a primary care doctor with whom I have good rapport. But I can’t talk to her every time I have a medical question—it can take weeks to just get a phone call! If I manage to get an appointment, it’s a 15 minute slot, and I have to try to remember all of the relevant info as we speed through possible diagnoses.
Using an llm not for diagnosis but to shape my knowledge means that my questions are better and more pointed, and I have a baseline understanding of the terminology. They’ll steer you wrong on the fine points, but they’ll also steer you _right_ on the general stuff in a way that Dr. Google doesn’t.
One other anecdote. My daughter went to the ER earlier this year with some concerning symptoms. The first panel of doctors dismissed it as normal childhood stuff and sent her home. It took 24 hours, a second visit, and an ambulance ride to a children’s hospital to get to the real cause. Meanwhile, I gave a comprehensive description of her symptoms and history to an llm to try to get a handle on what I should be asking the doctors, and it gave me some possible diagnoses—including a very rare one that turned out to be the cause. (Kid is doing great now). I’m still gonna take my kids to the doctor when they’re sick, of course, but I’m also going to use whatever tools I can to get a better sense of how to manage our health and how to interact with the medical system.
I always thought “ask your doctor” was included for liability reasons and not a thing that people actually could do.
I also have good insurance and a PCP. The idea that I could call them up just to ask “should I start doing this new exercise” or “how much aspirin for this sprained ankle?” is completely divorced from reality.
Yes, exactly this. I am an anxious, detail-focused person. I could call or message for every health-related question that comes to mind, but that would not be a good use of anyone’s time. My doctor is great, but she does not care about the minutiae of my health like I do, nor do I expect her to.
"ask your doctor" is more widespread than tthat. if you look up any diet or exercise advice, there's always an "ask your doctor before starting any new exercise program".
i'm not going to call my doctor to ask "is it okay if I try doing kettlebell squats?"
Yes, I totally got out of context and said something a bit senseless.
But also, maybe calling your doctor would be wise (eg if you have back problems) before you start doing kettlebell squats.
I'd say that the audience for a lot of health related content skews towards people who should probably be seeing a doctor anyway.
The cynic in me also thinks some of the "ask your doctor" statements are just slapped on to artificially give credence to whatever the article is talking about (eg "this is serious exercise/diet/etc).
Edit: I guess what I meant is: I don't think it's just "liability", but genuine advice/best practice/wisdom for a sizable chunk of audiences.
I am constantly terrified by the American healthcare system.
That's exactly what I (and most people I know) routinely do both in Italy and France. Like, "when in doubt, call the doc". I wouldn't know where to start if I had to handle this kind of stuff exclusively by myself.
I can e-mail my doctor and have a response within 2 days. He is not working alone, but has multiple assistants working. This is a normal doctors office that everyone is required to have in the Netherlands.
E-mails and communication is completely free of charge.
We all know that Google and LLM's are not the answer for your medical questions but that they cause fear and stress instead.
I live in the U.S. and my doctor is very responsive on MyChart. A few times a year i’ll send a message and I almost always get a reply within a day! From my PCP directly, or from her assistant.
My doctor is usually pretty good at responding to messages too, but there’s still a difference between a high-certainty/high-latency reply and a medium-certainty/low-latency reply. With the llm I can ask quick follow ups or provide clarification in a way that allows me to narrow in on a solution without feeling like I’m wasting someone else’s time. But yes, if it’s bleeding, hurting, or growing, I’m definitely going to the real person.
no, that’s what happens when you pick a busy doctor or a practice that’s overbooked in general. All too common these days! :(
This probably varies by locale. For example my doctor responds within 1 day on MyChart for quick questions. I can set up an in person or video appointment with her within a week, easily booked on MyChart as well.
This is the terrifying part: doctors do this too! I have an MD friend that told me she uses ChatGPT to retrieve dosing info. I asked her to please, please not do that.
Find good doctors. A solution doesn’t have to be perfect. A doctor doing better than regular joe with a computer is much higher as you can see in research around this topic
I have noticed that my doctor is getting busier and busier lately. I worry that cost cutting will have doctors so frantic that they are forced to rely on things like ChatGPT, and “find good doctors” will be an option only for an elite few.
I have a hunch that the whole "chat" interface is a brilliant but somewhat unintentional product design choice that has created this faux trust in LLM's to give back accurate information that others can get from drugs.com or Medline with a text search. This is a terrifying example, and please get her to test it out by second guessing the LLM and watching it flip flop.
your doctor can have a bad day. and or be an asshole.
In 40 years, only one of my doctors had the decency to correct his mistake after I pointed it out.
He prescribed the wrong Antibiotics, which I only knew because I did something dumb and wondered if the prescribed antibiotics cover a specific strain, which they didn't, which I knew because I asked an LLM and then superficially double-checked via trustworthy official, government sources.
He then prescribed the correct antibiotics. In all other cases where I pointed out a mistake, back in the day researched without LLMs, doctors justified their logic, sometimes siding with a colleague or "the team" before evaluating the facts themselves, instead of having an independent opinion, which, AFAIK, especially in a field like medicine, is _absolutely_ imperative.
I disagree. I'd wager that state of the art LLMs can beat out of the average doctor at diagnosis given a detailed list of symptoms, especially for conditions the doctor doesn't see on a regular basis.
"Given a detailed list of symptoms" is sure holding a lot of weight in that statement. There's way too much information that doctors tacitly understand from interactions with patients that you really cannot rely on those patients supplying in a "detailed list". Could it diagnose correctly, some of the time? Sure. But the false positive rate would be huge given LLMs suggestible nature. See the half dozen news stories covering AI induced psychosis for reference.
Regardless, it's diagnostic capability is distinct from the dangers it presents, which is what the parent comment was mentioning.
What you're describing, especially with the amount of water "given a detailed list of symptoms" is carrying, is essentially a compute-intensive flowchart with no concept of diagnostic parsimony.
Not really: it's arguably quite a lot worse. Because you can judge the trustworthiness of the source when you follow a link from Google (e.g. I will place quite a lot of faith in pages at an .nhs.uk URL), but nobody knows exactly how that specific LLM response got generated.
But at that point wouldn't it be easier to just search the web yourself? Obviously that has its pitfalls too, but I don't see how adding an LLM middleman adds any benefit.
For medication guidelines I'd just do a Google search. But sometimes I want 20 sources and a quick summary of them. Agent mode or deep research is so useful. Saves me so much time every day.
Agree, I usually force thinking mode too. I actually like the "Thinking mini" option that was just released recently, good middle ground between getting an instant answer and waiting 1-2 minutes.
> the user will at least need to know something about the topic beforehand.
This is why I've said a few times here on HN and elsewhere, if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer. Juniors can do amazing things, they can also goof up hard. What's really funny is you can make them audit their own code in a new context window, and give you a detailed answer as to why that code is awful.
I use it mostly on personal projects especially since I can prototype quickly as needed.
> if you're using an LLM you need to think of yourself as an architect guiding a Junior to Mid Level developer.
The thing is coding can (and should) be part of the design process. Many times, I though I have a good idea of what the solution should look like, then while coding, I got exposed more to the libraries and other parts of the code, which led me to a more refined approach. This exposure is what you will miss and it will quickly result in unfamiliar code.
> The key thing is to develop an intuition for questions it can usefully answer vs questions that are at a level of detail where the lossiness matters
It's also useful to have an intuition for what things an LLM is liable to get wrong/hallucinate, one of which is questions where the question itself suggests one or more obvious answers (which may or may not be correct), which the LLM may well then hallucinate, and sound reasonable, if it doesn't "know".
You don't even need a leading direct question. You can easily lead an LLM just by having some statements (even at times single words) in the context window.
>the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand. I believe that this lack of initial understanding of the user input
I think there's a parallel here for the internet as an i formation source. It delivered on "unlimited knowledge at the tip of everyone's fingertips" but lowering the bar also lowered the bar.
That access "works" only when the user is capable of doing their part too. Evaluating sources, integrating knowledge. Validating. Cross examining.
Now we are just more used to recognizing that accessibility comes with its own problem.
Some of this is down to general education. Some to domain expertize. Personality plays a big part.
The biggest factor is, i think, intelligence. There's a lot of 2nd and 3rd order thinking required to simultaneously entertain a curiosity, consider of how the LLM works, and exercise different levels of skepticism depending on the types of errors LLMs are likely to make.
> the problem is that in order to develop an intuition for questions that LLMs can answer, the user will at least need to know something about the topic beforehand
This is why simonw (The author) has his "pelican on a bike" -test, it's not 100% accurate but it is a good indicator.
I have a set of my own standard queries and problems (no counting characters or algebra crap) I feed to new LLMs I'm testing
None of the questions exist outside of my own Obsidian note so they can't be gamed by LLM authors. And I've tested multiple different LLMs using them so I have a "feeling" on what the answer should look like. And I personally know the correct answer so I can immediately validate them.
Even if your queries are hidden via a local running model you must have some humility that your queries are not actually unique. For this reason I have a very difficult time believing that a basic LLM will be able to properly reason about complex topics, it can regurgitate to whatever level its been trained. That doesn't make it less useful though. But on the edge case how do we know the query its ingesting gets trained with a suitable answer? Wouldn't this constitute an over-fitting in these cases and be terribly self-reinforcing?
I don't think anybody talking about "latency" without a qualifier is thinking about build latency.
But it's a nice article. The idea that giving-up on waiting for a delay has a simple exponential distribution is something that I never thought. (And now I'm fixed on understanding why... Something must have biased me against it.)
There was a also proposal for ICANN to reserve ".internal" (earlier this year) which is what I currently use. I suppose home.arpa has the advantage of being strictly resolved in the local zone while ".internal" would be more for anything in a private network (or a large multi zone network)?
To me fanalyzer is one of GCC killer features over clang. It makes programming C much easier by explaining errors. The error messages also began to feel similar to Rust in terms of being developer friendly.
I know Rust (esp on HN) is very hyped for its memory safety and nice abstractions, but I really wonder how much Rust owes its popularity to its error messages.
I would say the #1 reason I stop learning a technology is because of frustrating or unclear errors.
EDIT: Getting a bit of topic, but I meant more because I love C and would love it more with rust level error messages.
Not when you called templated functions and were greeted with compile-time template stack traces. Or you called overloaded functions and were presented with 50 alternatives you might have meant. The language is inherently unfriendly to user-friendly error messages.
In my opinion, the complexity of the interactions between C++'s {preprocessor, overload resolution, template resolution, operator overloading, and implicit casting} can make it really hard to know the meaning of a code snippet you're looking at.
If people use these features only in a very limited, disciplined manner it can be okay.
But on projects where they don't, by golly it's a mess.
(I suppose it's possible to write a horrible mess in any language, so maybe it's unfair for me to pick on C++.)
I’m talking about C++. You wrote that Clang already had friendly error messages. While they were less unfriendly than GCC, calling them friendly is a stretch.
Rust having traits instead of templates is a big ergonomic improvement in that area.
Funnily enough, trait bounds are still a big pain in the neck to provide good diagnostics for because of the amount of things that need to be tracked that are cross cutting across stages of the compiler that under normal operation don't need to talk to each other. They got better in 2018, as async/await put them even more front and center and focused some attention on them, and a lot of work for keeping additional metadata around was added since then (search the codebase for enum ObligationCauseCode if you're curious) to improve them. Now with the new "next" trait solver they have a chance to get even better.
It still easier than providing good diagnostics for template errors though :) (althought I'm convinced that if addressing those errors was high priority, common cases of template instantiations could be modeled internally in the same way as traits purely for diagnostics and materially improve the situation — I understand why it hasn't happened, it is hard and not obviously important).
That's definitely the most painful part of iterating on Nix code for me, even in simple configs. You eventually develop an intuition for common problems and rely more on that than on deciphering the stack traces, but that's really not ideal.
Actually, thats a reason why I never even touched Nix. Besides, being functional and all the hype, but the syntax and naming of the language feels ad-hoc enough for me to never have caught on...
... but you do get an error. That's a lot better what you typically get with C or C++. Assuming it's valid systax, of course.
This is a veering off topic, but I do agree that Nix-the-language has a lot of issues.
(You might suggest Guix, but I don't want to faff about with non-supported repositories for table stakes like firmware and such. Maybe Nickel will eventually provide a more pleasant and principled way to define Nix configurations?)
I tried some kind of BBC micro at a computer museum, and found out that if you had an error anywhere in your BASIC program, it would just print "error". No line number, no hint at what the problem was.
I could understand some kind of ancient system not having the detail or knowledge to explain what happened in particular, but this is something that still happens in a lot of Microsoft software in particular.
Outlook has a consistent tendency to give you errors like "Couldn't get your mail for some reason", or Windows saying "Hey networking isn't working". No "connection timed out" or "couldn't get an IP address" or "DNS lookup failed" or any other error message that is possible to diagnose. Even the Windows network troubleshooting wizard (the "let us try to diagnose why things aren't working for you" process) would consistently give me "yeah man idk" results, when the error is that I'm not getting an address from DHCP and should be extremely easy to diagnose.
I get that in a lot of cases, problems cut across lots of errors or areas of responsibility, and getting some other team making some other library to expose their internals to your application might be difficult in an environment like Microsoft, but it's just inexplicable that so much software, even these days, resorts to "nope can't do it" and bail out.
Haha, reminds me of some Scheme interpreter that would just say something like 'missing paren' at position 0 or EOF depending on where the imbalance was :)
... but, yeah... I'm pretty sure there could be some hints as to whereabouts that infinite recursion was detected.
Elm is acknowledged as being the initial inspiration for focusing on diagnostics early on, but Rust got good error messages through elbow grease and focused attention over a long period of time.
People getting used to good errors and demanding more, is part of the virtuous circle that keeps them high quality.
Making good looking diagnostics requires UX work, but making good diagnostics requires a flexible compiler architecture and a lot of effort, nothing more, nothing less.
Yeah Rust is popular because it's a practical language with a nice type system, decent escape hatches, and good tooling. The borrow checker attracts some, but it could have easily been done in a way with terrible usability.
"Strongly typed, weakly checked". Which is a funny way to say "Not strongly typed" or perhaps more generously "The compilers aren't very good and neither are the programmers but other than that..." (and yes I write that as a long time C programmer)
But hey, C does have types:
First it has several different integers with silly names like "long" and "short".
Then it has the integers again but wearing a Groucho mask and with twice as many zeroes, "float" and "double".
Then an integer that's probably one byte, unless it isn't, in which case it is anyway, and which doesn't know whether it's signed or not, "char".
Then a very small integer that takes up too much space ("_Bool" aka bool)
Finally though, it does have types which definitely aren't integers, unfortunately they participates in integer arithmetic anyway and many C programmers believe they're integers, but the compiler doesn't so that's... well it's a disaster, I speak of course of the pointers.
You could try to argue this is the only source of rust's popularity.... or you could admit that the borrow checker is in fact a reason why folks use Rust over C.
The hard problem with C is that it's hard to tell if what the programmer wrote is an error. Hence warnings... which can be very hit or miss, or absurd overkill in some cases.
(Signed overflow being a prime example where you really either just need to define what happens or accept that your compiler is basically never going to warn you about a possible signed overflow -- which is UB. The compromise here by Rust is to allow one to pick between some implementation defined behaviors. That seems pretty sensible.)
Good. I wonder how many people do and also if their compilers support it. (One would hope so, of course. I assume clang and GCC do.)
... but the question is really what you ship to production.
Btw, possible signed overflow was just an example of things people do not want warnings for. OOB is far more dangerous, obviously... and the cost for sanitizer in that case is HUGE... and it doesn't actually catch all cases AFAIUI.
For production one could use -fsanitize-undefined-trap-on-error that turns it into traps. I would not describe the cost of -fsanitize-undefined=bounds has huge. The cost of Asan is huge.
I've found it to have quite poor defaults for its analysis (things like suggesting "use annex k strcpy_s instead of strcpy").
fanalyzer is still by far the easiest to configure.
I have had the exact opposite experience: clang constantly gives me much better error messages than GCC, implementations of some warnings or errors catch more cases, and clang-tidy is able to do much better static analysis.
An issue is immediacy: problems are better the earlier they are pointed out (why online errors are better than compile errorswl, which are better than CI errors, which are runtime errors). Having to copy paste an error adds a layer of indirection that gets in the way of the flow.
Another is reproducibility and accuracy: LLMs have a tendency to confidently state things that are wrong, and to say different things to different people, the compiler has the advantage of being deterministic and generally have better understanding of what's going on to produce correct suggestions (although we still have cases of incorrect assumptions producing invalid suggestions, I believe we have a good track record there).
If those tools help you, more power to you, but I fear their use by inexperienced rustaceans being misled (an expert can identify when the bot is wrong, a novice might just end up questioning their sanity).
Side note: the more I write the more I realize that the same concerns I have with LLMs also apply to the compiler in some way and am trying to bridge that cognitive dissonance. I'm guessing that the reproducibility argument, ensuring the same good error triggers for everyone that makes the same mistake and the lack of human curation, are the thing that makes me uneasy about LLMs for teaching languages.
FYI, in VS Code, you highlight the error in the terminal, right click and select "copilot explain this." One less layer of indirection. In C++, I ultimately only end up using it for 10% of the errors, but because it's the type of error with a terrible message, copilot sees through it and puts it in plain English.
I was so impressed with gpt-4's ability to diagnose and correct errors that i made this app to catch python runtime errors, and automatically make gpt-4 code inject the correction: https://github.com/matthewkolbe/OpenAIError
Certainly for the only new diagnostic I wrote for Rust, I expect an LLM's hallucinations are likely to have undesirable consequences. When you write 'X' where we need a u8, my diagnostic says you can write b'X' which is likely what you meant, but the diagnostic deliberately won't do this if you wrote '€' or '£' or numerous other symbols that aren't ASCII - because b'€' is an error too, so we didn't help you if we advised you to write that, you need to figure out what you actually meant. I would expect some LLMs to suggest b'€' there anyway.
This reminds me one of the reasons I hated C++ so much. 1000+ lines of error messages about template instantiation, instead of 'error: missing semicolon'.
In our programming class in high school we were using Borland C++; I had a classmate call me over to ask about an error they were getting from the compiler.
> "Missing semicolon on line 32"
I looked at it, looked at them, and said "You're missing a semicolon on line 32". They looked at line 32 and, hey! look at that! Forgot a semicolon at the end. Added it and their program worked fine.
Even the best error messages can't help some people.
I'm quite surprised to hear this. What do you get from GCC's analyser that Clang's static analyser doesn't already report?
I tried to use GCC's analyser several times, but I couldn't find any good front ends to it that make the output readable. Clang has multiple (reasonably good HTML output, CodeChecker, Xcode integration, etc.). How do you read the output?
Furthermore, I find that GCC produces many more false positives than Clang.
While I wish GCC would implement integrations and/or a language server, I usually do C programming in the terminal (with entr to trigger automatic rebuild on save).
I do find some false positives, but I haven't had many of them to be a deal breaker for me. Aside from what I mentioned about the errors being descriptive, I do like the defaults and that it's part of the compilation process.
for example, possible malloc null warning is on by default (which i don't think is on clang).
It's so obviously hilarious that I'm surprised they don't call it out, or if they don't want to draw attention to it, moved it into multiple sentences.
"0.5% of our tinfoil hat purchasers wear it, according to our wearing-telemetry."
Yes, brave has 2 checkboxes in its settings page. One for crash reports and the other for privacy-preserving product analytics (P3A) which is what they are talking about here. Disabling those stops Brave from sending anything to brave endpoints, aside from auto update checks of course.
while this is due to Zig maintainers' code quality, I think a large contributing factor is the choice of syntax. As an exercise, try navigating a C, C++ and any other language source code without an IDE or LSP. things like:
- "Where did that function come from?"
- "What and where is this type?"
what do you have to do to find that out? due to the flexible ways you can declare things in C, it may take you a lot of steps to find these information. even in search, a variable and a function can share the same prefix due to the return type placement. hence why some people prefer function return types in a separate line.
Even with languages like Rust for example, finding if a type in a function parameters is an enum or struct and finding its definition can require multiple steps like search "enum Foo" or "struct Foo", in Zig i can search "const Foo" and i will immediately know what it is.
while i do hope that C gets defer and constexpr functions in the next standard or maybe better generics or enums, Zig syntax is much better to work with in my opinion.