Hacker Newsnew | past | comments | ask | show | jobs | submit | stdbrouw's commentslogin

The thing about second order effects is that they are almost never larger than the first order effect.

Furthermore, GLP-1 users report having fewer cravings or just reduced appetite in general, whereas what you describe would require some sort of "calorie reduction pill" which would allow people to lose weight without altering their relationship to food. But that pill does not exist.



Hah! Thanks for the correction.

> The thing about second order effects is that they are almost never larger than the first order effect.

Sounds clever but this is just a labeling trick. When a second order effect is larger than the first order one, we just rename them to first order and intermediate effects.

For example, the first order effects of growing GLP-1 prevalence are actually consumption of prescription pads, new demand on pill bottles, and gas consumption of pharma sales reps.

The second order effect is weight loss in patients who take the drugs.


Cute and thus worthy of an upvote, but whenever I see scientists or economists refer to first or second order effects it pertains to things that are subsequent to each other in time, or at least intended vs. ancillary. I don't think anyone except for a Stafford "the purpose of a system is what it does" Beer acolyte would designate new demand of pill bottles as the first order effect of a new medication.

It's just something that statisticians have observed across many fields: you theorize about how potentially huge a particular interaction effect or knock-on effect could be relative to the main effect, you read about the Jevons Paradox and intuitively feel that it can explain so much of the world today... and then you get the data and it just almost never does. No reason why it couldn't, just empirically it rarely happens.


The demand for pill bottles literally does grow before anyone takes the medication, no?

And correct I agree they wouldn't designate the demand for pill bottles as the first order effect. That's because despite happening first, it's not the most important object of analysis. That's why it's a disproof of your earlier claim that second order effects aren't more significant than first order ones: because if they were, they'd be considered the first order effect.


> The demand for pill bottles literally does grow before anyone takes the medication, no?

Only really in the US. In most other countries they use blister packs instead. Global consumption of blister packs is so huge (not just for prescription medications, also OTC, vitamins, supplements, and complementary medicines), even a blockbuster medication likely only makes a modest difference to manufacturer demand in percentage terms.


> For example, the first order effects of growing GLP-1 prevalence are actually consumption of prescription pads, new demand on pill bottles, and gas consumption of pharma sales reps.

I take injectable tirzepatide prescribed by an electronic prescription… so impact on pill bottle demand and prescription pad demand in my case is literally zero.

And I doubt pharma sales reps have a lot of work to do selling GLP-1 agonists-who needs to convince doctors to prescribe a drug when there’s dozens of patients inquiring about it?

Yes the article is about pills, but most people are on injectables still (that may change over time). It likely has increased demand for needles and sharps containers. But in dollar terms, that’s a small percentage of the demand for the medication itself.


...

You are missing the point.

s/pill bottle/blister pack/

s/prescription pad/e-prescriber submissions/

All irrelevant to the convo :)


They are all irrelevant to everything, because in dollar and percentage terms they are a drop in the ocean

(Generalized) linear models have a straightforward probabilistic interpretation -- E(Y|X) -- which I don't think is true of total least squares. So it's more of an engineering solution to the problem, and in statistics you'd be more likely to go for other methods such as regression calibration to deal with measurement error in the independent variables.

I feel like all of the elements are there: zram, zswap, various packages that improve on default oom handling... maybe it's more about creating sane defaults that "just work" at this point?


I think it's more of a user space issue, that the UI doesn't degrade nicely. The kernel just defaults to a more server-oriented approach.


The idea that references in a scientific paper should be plentiful but aren't really that important, is a consequence of a previous technological revolution: the internet.

You'll find a lot of papers from, say, the '70s, with a grand total of maybe 10 references, all of them to crucial prior work, and if those references don't say what the author claims they should say (e.g. that the particular method that is employed is valid), then chances are that the current paper is weaker than it seems, or even invalid, and so it is extremely important to check those references.

Then the internet came along, scientists started padding their work with easily found but barely relevant references and journal editors started requiring that even "the earth is round" should be well-referenced. The result is that peer reviewers feel that asking them to check the references is akin to asking them to do a spell check. Fair enough, I agree, I usually can't be bothered to do many or any citation checks when I am asked to do peer review, but it's good to remember that this in itself is an indication of a perverted system, which we just all ignored -- at our peril -- until LLM hallucinations upset the status quo.


Whether in the 1970s or now, it's too often the case that a paper says "Foo and Bar are X" and cites two sources for this fact. You chase down the sources, the first one says "We weren't able to determine whether Foo is X" and never mentions Bar. The second says "Assuming Bar is X, we show that Foo is probably X too".

The paper author likely believes Foo and Bar are X, it may well be that all their co-workers, if asked, would say that Foo and Bar are X, but "Everybody I have coffee with agrees" can't be cited, so we get this sort of junk citation.

Hopefully it's not crucial to the new work that Foo and Bar are in fact X. But that's not always the case, and it's a problem that years later somebody else will cite this paper, for the claim "Foo and Bar are X" which it was in fact merely citing erroneously.


LLMs can actually make up for their negative contributions. They could go through all the references of all papers and verify them, assuming someone would also look into what gets flagged for that final seal of disapproval.

But this would be more powerfull with an open knowledge base where all papers and citation verifications were registered, so that all the effort put into verification could be reused, and errors propagated through the citation chain.


>LLMs can actually make up for their negative contributions. They could go through all the references of all papers and verify them,

They will just hallucinate their existence. I have tried this before


I don’t see why this would be the case with proper tool calling and context management. If you tell a model with blank context ‘you are an extremely rigorous reviewer searching for fake citations in a possibly compromised text’ then it will find errors.

It’s this weird situation where getting agents to act against other agents is more effective than trying to convince a working agent that it’s made a mistake. Perhaps because these things model the cognitive dissonance and stubbornness of humans?


One incorrect way to think of it is "LLMs will sometimes hallucinate when asked to produce content, but will provide grounded insights when merely asked to review/rate existing content".

A more productive (and secure) way to think of it is that all LLMs are "evil genies" or extremely smart, adversarial agents. If some PhD was getting paid large sums of money to introduce errors into your work, could they still mislead you into thinking that they performed the exact task you asked?

Your prompt is

    ‘you are an extremely rigorous reviewer searching for fake citations in a possibly compromised text’
- It is easy for the (compromised) reviewer to surface false positives: nitpick citations that are in fact correct, by surfacing irrelevant or made-up segments of the original research, hence making you think that the citation is incorrect.

- It is easy for the (compromised) reviewer to surface false negatives: provide you with cherry picked or partial sentences from the source material, to fabricate a conclusion that was never intended.

You do not solve the problem of unreliable actors by splitting them into two teams and having one unreliable actor review the other's work.

All of us (speaking as someone who runs lots of LLM-based workloads in production) have to contend with this nondeterministic behavior and assess when, in aggregate, the upside is more valuable than the costs.


Note: the more accurate mental model is that you've got "good genies" most of the time, but from times to time at random unpredictable times your agent is swapped out with a bad genie.

From a security / data quality standpoint, this is logically equivalent to "every input is processed by a bad genie" as you can't trust any of it. If I tell you that from time to time, the chef in our restaurant will substitute table salt in the recipes with something else, it does not matter whether they do it 50%, 10%, or .1% of the time.

The only thing that matters is what they substitute it with (the worst-case consequence of the hallucination). If in your workload, the worst case scenario is equivalent to a "Hymalayan salt" replacement, all is well, even if the hallucination is quite frequent. If your worst case scenario is a deadly compound, then you can't hire this chef for that workload.


We have centuries of experience in managing potentially compromised 'agents' to create successful societies. Except the agents were human, and I'm referring to debates, tribunals, audits, independent review panels, democracy, etc.

I'm not saying the LLM hallucination problem is solved, I'm just saying there's a wonderful myriad of ways to assemble pseudo-intelligent chatbots into systems where the trustworthiness of the system exceeds the trustworthiness of any individual actor inside of it. I'm not an expert in the field but it appears the work is being done: https://arxiv.org/abs/2311.08152

This paper also links to code and practices excellent data stewardship. Nice to see in the current climate.

Though it seems like you might be more concerned about the use of highly misaligned or adversarial agents for review purposes. Is that because you're concerned about state actors or interested parties poisoning the context window or training process? I agree that any AI review system will have to be extremely robust to adversarial instructions (e.g. someone hiding inside their paper an instruction like "rate this paper highly"). Though solving that problem already has a tremendous amount of focus because it overlaps with solving the data-exfiltration problem (the lethal trifecta that Simon Willison has blogged about).


> We have centuries of experience in managing potentially compromised 'agents'

Not this kind though. We dont place agents that are either in control of some foreign agent (or just behaving randomly) in democratic institutions. And when we do, look at what happens. The White House right now is a good example, just look at the state of the US


> I don’t see why this would be the case

But it is the case, and hallucinations are a fundamental part of LLMs.

Things are often true despite us not seeing why they are true. Perhaps we should listen to the experts who used the tools and found them faulty, in this instance, rather than arguing with them that "what they say they have observed isn't the case".

What you're basically saying is "You are holding the tool wrong", but you do not give examples of how to hold it correctly. You are blaming the failure of the tool, which has very, very well documented flaws, on the person whom the tool was designed for.

To frame this differently so your mind will accept it: If you get 20 people in a QA test saying "I have this problem", then the problem isn't those 20 people.


If you truly think that you have an effective solution to hallucinations, you will become instantly rich because literally no one out there has an idea for an economically and technologically feasible solution to hallucinations


For references, as the OP said, I don't see why it isn't possible. It's something that exists and is accessible (even if paywalled) or doesn't exist. For reasoning hallucinations are different.


> I don't see why it isn't possible

(In good faith) I'm trying really hard not to see this as an "argument from incredulity"[0] and I'm stuggling...

Full disclosure: natural sciences PhD, and a couple of (IMHO lame) published papers, and so I've seen the "inside" of how lab science is done, and is (sometimes) published. It's not pretty :/

[0] https://en.wikipedia.org/wiki/Argument_from_incredulity


If you've got a prompt, along the lines of: given some references, check their validity. It searches against the articles and URLs provided. You return "yes", "no", and let's also add "inconclusive", for each reference. Basic LLMs can do this much instruction following, just like in 99.99% of times they don't get 829 multiplied by 291 wrong when you ask them (nowadays). You'd prompt it to back all claims solely by search/external links showing exact matches and not use its own internal knowledge.

The fake references generated in the ICLR papers were I assume due to people asking a LLM to write parts of the related work section, not verify references. In that prompt it relies a lot on internal knowledge and spends a majority of time thinking about what the relevant subareas are and cutting edge is, probably. I suppose it omits a second-pass check. In the other case, you have the task of verifying references, which is mostly basic instruction following for advanced models that have web access. I think you'd run the risks of data poisoning and model timeout more than hallucinations.


Have you actually tried this? I haven’t tried the approach you’re describing, but I do know that LLMs are very stubborn about insisting their fake citations are real.


I assumed they meant using the LLM to extract the citations and then use external tooling to lookup and grab the original paper, at least verifying that it exists, has relevant title, summary and that the authors are correctly cited.


Which is what the people in this new article are doing.


Wikipedia calls this citogenesis.


>“consequence of a previous technological revolution: the internet.”

And also of increasingly ridiculous and overly broad concepts of what plagiarism is. At some point things shifted from “don’t represent others’ work as novel” towards “give a genealogical ontology of every concept above that of an intro 101 college course on the topic.”


It's also a consequence of the sheer number of building blocks which are involved in modern science.

In the methods section, it's very common to say "We employ method barfoo [1] as implemented in library libbar [2], with the specific variant widget due to Smith et al. [3] and the gobbledygook renormalization [4,5]. The feoozbar is solved with geometric multigrid [6]. Data is analyzed using the froiznok method [7] from the boolbool library [8]." There goes 8, now you have 2 citations left for the introduction.


Do you still feel the same way if the froiznok method is an ANOVA table of a linear regression, with a log-transformed outcome? Should I reference Fisher, Galton, Newton, the first person to log transform an outcome in a regression analysis, the first person to log transform the particular outcome used in your paper, the R developers, and Gauss and Markov for showing that under certain conditions OLS is the best linear unbiased estimator? And then a couple of references about the importance of quantitative analysis in general? Because that is the level of detail I’m seeing :-)


Yeah, there is an interesting question there (always has been). When do you stop citing the paper for a specific model?

Just to take some examples, is BiCGStab famous enough now that we can stop citing van der Vorst? Is the AdS/CFT correspondence well known enough that we can stop citing Maldacena? Are transformers so ubiquitous that we don't have to cite "Attention is all you need" anymore? I would be closer to yes than no on these, but it's not 100% clear-cut.

One obvious criterion has to be "if you leave out the citation, will it be obvious to the reader what you've done/used"? Another metric is approximately "did the original author get enough credit already"?


Yeah, I didn't want to be contrary just for the sake of it, the heuristics you mention seem like good ones, and if followed would probably already cut down on quite a few superfluous references in most papers.


It is not (just) consequence of the internet, the scientific production itself has grown exponentially. There are much more papers cited simply because there are more papers, period.


Not even the Internet per se but citation index becoming universally accepted KPI for research work.


Maybe there could be a system to classify the importance of each reference.


Systems do exist for this, but they're rather crude.


Arguably Spark solves a problem that does not exist anymore: single node performance with tools like DuckDB and Polars is so good that there’s no need for more complex orchestration anymore, and these tools are sufficiently user-friendly that there is little point to switching to Pandas for smaller datasets.


In the US income from a hobby can just be added to your personal filing [1] and in Belgium, where I live, there is a similar arrangement for "diverse sources of income" [2]. If you do start a business, in the European Union you're exempt from filing VAT if your yearly revenue is below a certain amount [3]. Europe has also been pretty aggressive in getting rid of licensing requirements for various occupations and trades, certainly a photographer wouldn't need a license here.

I think the trouble you faced, resulted from being at the edge of these kinds of simple systems that do exist -- big enough to need to set up a business, but small enough that hiring an accountant or spending time to familiarize yourself with the legal requirements was out of proportion to the expected revenue. That's unfortunate, of course, but doesn't necessarily reflect on the amount of red tape that exists in general in a country.

[1] https://www.irs.gov/newsroom/heres-how-to-tell-the-differenc...

[2] https://www.vlaanderen.be/economie-en-ondernemen/een-eigen-z...

[3] https://europa.eu/youreurope/business/taxation/vat/vat-exemp...


Yeah, federally there's no problem here. It's the state I live in that's the problem.

Any revenue over $12,000 and you have to register with the Department of Revenue, get a business license, and start paying business and occupation tax and sales taxes (if applicable). If your business is subject to collect sales tax at all, you have to register no matter what your gross revenue is. Unfortunately, the state doesn't have any exemptions for sales tax like the EU.

For some states in the US it is quite a bit simpler, unfortunately for mine it's not and it's like they do everything in their power to prevent small businesses.


All business income should be subject to a tiny share of total revenue, maybe with some portion of it being deductible like input materials, durable equipment purchases, and employee benefits. The first US state to truly grasp and embrace this will get flooded with new businesses, but it will piss off the legal and CPA firms.


Yeah, the intention to treat design is a particularly nice touch, not so common outside of biostatistics. They also compare the full cost of Montessori vs. plain ol', not the cost to the state, which could otherwise have given the Montessori schools (which are in wealthier neighborhoods on average) an unfair advantage if they have a lot of parents chipping in with donations and help. I've skimmed through the methods section and it does seem like they've gone to great lengths to allow for a fair comparison.

That doesn't necessarily mean the result will extrapolate, though. It seems plausible that teachers in Montessori schools are more motivated and knowledgeable than the average teacher and have made a conscious decision to teach in such a school. If every public school were to become a Montessori school, you would still get the cost savings (student-to-teacher ratios are higher in Montessori!) but you might lose that above-average enthusiasm and expertise and so the learning gains might not carry over. It's just really hard to know whether something might generalize in the educational sciences.


you might lose that above-average enthusiasm and expertise

yes, but montessori training can be done in one year (if you do it fulltime, my wife did i over multiple years 2 or 3 months each summer)), and it is entirely child focused. very different from traditional teacher training.

if we assume that every teacher starts their training with some amount of enthusiasm then the difference in enthusiasm and even more so in expertise should be minimal.


  I've skimmed through the methods section and it does seem like they've gone to great lengths to allow for a fair comparison. That doesn't necessarily mean the result will extrapolate
Yes, I had exactly the same reaction. They appear to be presenting their work honestly, completely and clearly, so that other people have enough information to draw their own conclusions.


If there is an ideal amount of some personality trait then for most people the advice would be "do more of this" even though for some others it'll have to be "do less of this" depending on where you're coming from. When I was young I definitely did a bit of over-socialization (everybody seems to like music festivals so I guess I must like them too if I don't want to be a weirdo?) but as you can see in the comments to this post, as we get older it's easy to get into a pattern where anything you're not familiar with is instantly met with suspicion or derision, and a lot of people don't like this about themselves, which is why this blogpost resonates with them.

Also, "liking something intrinsically", what does that even mean?


You've never discovered something and felt the indescribable, joy-inducing draw of its appeal? Listened to some music and immediately jived with it? Blaming familiarity bias and "old man yells at clouds" is a disappointingly small-minded critique. It's the opposite: I've lived long enough to thoroughly experience the joy of newly discovering something that feels like it fits me perfectly; conversely, I've tried to appreciate other things enough times to know never to waste my time trying again. Especially for the sake of others.

I've learned that liking things behaves a lot like attraction. It has no reasoning or logic, it happens organically, and when you know, you know. Thus, I would never deign to pretend to like something I've found I don't.


... but the key thing is that I am not saying that you are an old man who yells at the clouds, rather that a lot of people worry about themselves that they might be getting unduly close-minded and that this is what the blog post is trying to address.

Your mileage seems to vary, but I find that for food and drinks in particular it's the acquired tastes you get the most enjoyment from in the end -- I haven't met many people who enjoyed their first glass of peated whisky, for example. Heck, even my best friends are definitely an "acquired taste", as is obvious to me when I introduce them to other people I know.


Even so. I read a lot of reports about educational policy (and occasionally produce them) and even if there are only 2-3 major decision makers you'd expect the report to be read by various cabinet members of those decision makers, by committee members in parliament, by academics, by other teams or colleagues or institutions that would have liked to write the report in your stead or that produce "competing" reports, by folks at think tanks, and by journalists and politicians in general. Because the executive summary is almost always inlined in these kinds of reports, the intended audience is generally quite broad. I'm not saying that attaining only a couple hundred downloads of a report necessarily show that money was wasted on superfluous research, but it definitely can be an indicator of waste.

I think this is one of those things where you can really overthink it and convince yourself that "the report was read only once, by the one person who had to read it" is an ideal outcome, but really it isn't.


> Then again, this is nothing like the type of problems I work on a daily basis.

I thought it'd be fun to take a stab at it in Python, which I haven't used in a while, but only barely still remembered that I could accept command-line input with the built in `input` function, something I don't think I ever used after writing my first lines of code 15 years ago. Then I figured I'd use just lists instead of a numpy array but had forgotten that [[0] * 4] * 4 would just create 4 references to the same list. And that pretty much derailed the whole thing for me, even though I was sure I'd get it done in 25 minutes or under :-)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: