Hacker Newsnew | past | comments | ask | show | jobs | submit | sweezyjeezy's commentslogin

The trivia approach doesn't even work for most people - ask the wikipedia reader and the person who travelled to Turkey about it a year later and see who has actually retained some knowledge.

That's a stretch. One can hold the view that division of labour is a useful economical principle, but also that oligopolies represent a dangerous concentration of power.

What oligopolies?

I think one of the best arguments against US interventionalism when it comes to tyrants is just how 'variable' (let's say) the outcomes have been over the years. For every Panama, there's two or three Guatamalas, Irans or most recently Iraq. Generally the hard part is not the removal of the head of state, which for the US is usually pretty quick. It's what beurocratic structures remain functional and whether the power vacuum created brings something better and more robust, or just decades of violence.

I think Sarah Paine on dwarkesh has noted that it tends to go well when the countries already have fairly robust institutions and tends to go badly when they don't

As I'm not a historian, I can only note that it hasn't gone well recently even when multiple successive presidents want it to


In Iraq/Afghanistan, the US dismantled the institutions and tried to build new ones.

In Venezuela, it appears they are simply moving the gun to the head of Maduro's replacement.


It's also really hard to make the tunnel remain a tunnel over its expected 150 year lifespan - given that it basically runs through a fault line. They had to study and test local geology for about 15 years, build certain sections to expect some movement over time, as well as kit everything out with a lot of sensors.

Overall an amazing achievement, and unsurprising it took this long to figure out!


After seeing some of the safety features in a short video I linked in another comment, I get the impression that this is either going to last much longer than 150 years or something so catastrophic will happen that nothing that could have been built would've persisted.


You could make an LLM deterministic if you really wanted to without a big loss in performance (fix random seeds, make MoE batching deterministic). That would not fix hallucinations.

I don't think using deterministic / stochastic as a diagnostic is accurate here - I think that what we're really talking is about some sort of fundamental 'instability' of LLMs a la chaos theory.


Hallucinations can never be fixed. LLM's 'hallucinate' because that is literally what they can ONLY do, provide some output given some input. The output is measured and judged by a human who then classifies it as 'correct' or 'incorrect'. In the later case it seems to be labelled as a 'hallucination' as if it did something wrong. It did nothing wrong and worked exactly as it was programmed to do.


We talk about "probability" here because the topic is hallucination, not getting different answers each time you ask the same question. Maybe you could make the output deterministic but does not help with the hallucination problem at all.


Exactly - 'non-deterministic' is not an accurate diagnosis of the issue.


Yeah deterministic LLMs just hallucinate the same way every time.


100% o3 has a strong bias towards "write something that looks like a formal argument that appears to answer the question" over writing something sound.

I gave it a bunch of recent, answered MathOverflow questions - graduate level maths queries. Sometimes it would get demonstrably the wrong answer, but it not be easy to see where it had gone wrong (e.g. some mistake in a morass of algebra). A wrong but convincing argument is the last thing you want!


Gemini is clearer but MY GOD is it verbose. e.g. look at problem 1, section 2. Analysis of the Core Problem - there's nothing at all deep here, but it seems the model wants to spell out every single tiny logical step. I wonder if this is a stylistic choice or something that actually helps the model get to the end.


They actually do help - in that they give the model more computation time and also allow realtime management of the input context by the model. You can see this same behavior in the excessive comment writing some coding models engage in; Anthropic interviews said these do actually help the model.


Gemini did not one-shot these answers; it did its thinking elsewhere (probably not released by Google) and then it consolidated it down into what you see in the PDF. From the article:

> We achieved this year’s result using an advanced version of Gemini Deep Think – an enhanced reasoning mode for complex problems that incorporates some of our latest research techniques, including parallel thinking. This setup enables the model to simultaneously explore and combine multiple possible solutions before giving a final answer, rather than pursuing a single, linear chain of thought.

I don't see any parallel thinking, e.g., so that was probably elided in the final results.


Yes, because these are the answers it gave, not the thinking.


Section 2 is a case by case analysis. Those are never pretty but perfectly normal given the problem.

With OpenAI that part takes up about 2/3 if the proof even with its fragmented prose. I don't think it does much better.


It's not it being case by case that's my issue. I used do olympiads and e.g. for the k>=3 case I wouldn't write much more than:

"Since there are 3k - 3 points on the perimeter of the triangle to be covered, and any sunny line can pass through at most two of them, it follows that 3k − 3 ≤ 2k, i.e. k ≤ 3."

Gemini writes:

Let Tk be the convex hull of Pk. Tk is the triangle with vertices V1 = (1, 1), V2 = (1, k), V3 = (k, 1). The edges of Tk lie on the lines x = 1 (V), y = 1 (H), and x + y = k + 1 (D). These lines are shady.

Let Bk be the set of points in Pk lying on the boundary of Tk. Each edge contains k points. Since the vertices are distinct (as k ≥ 2), the total number of points on the boundary is |Bk| = 3k − 3.

Suppose Pk is covered by k sunny lines Lk. These lines must cover Bk. Let L ∈ Lk. Since L is sunny, it does not coincide with the lines containing the edges of Tk. A line that does not contain an edge of a convex polygon intersects the boundary of the polygon at most at two points. Thus, |L ∩ Bk| ≤ 2. The total coverage of Bk by Lk is at most 2k. We must have |Bk| ≤ 2k. 3k − 3 ≤ 2k, which implies k ≤ 3.


I'll admit I didn't look to deeply if it could be done simpler, but surely that is still miles better than what OpenAI did? At least Gemini's can be simplified. OpenAI labels all points and then enumerates all the lines that go through them.


BTC is a (roughly) net-zero enterprise, every dollar taken out of the system comes from someone else putting a dollar in. Sure, if you had a crystal ball you could have made millions, but if everyone else ALSO had that same crystal ball you couldn't, since traders are mostly just shuffling money between themselves anyway.

There's no point kicking yourself over not foreseeing a far-fetched future scenario, if you were at a casino and a roulette spin landed on 12 - would you feel bad for not betting on that happening, despite having no good information it would land on that?


A lot of the stock market is built on this model too surely?

I get some of it works off dividends etc, but so much is sentiment driven and also based off of someone else making a loss.


FWIW the original ARC was published in 2019, just after GPT-2 but a while before GPT-3. I work in the field, I think that discussing AGI seriously is actually kind of a recent thing (I'm not sure I ever heard the term 'AGI' until a few years ago). I'm not saying I know he didn't feel that, but he doesn't talk in such terms in the original paper.


> We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.

https://arxiv.org/abs/1911.01547


> It is important to note that ARC is a work in progress, not a definitive solution; it does not fit all of the requirements listed in II.3.2, and it features a number of key weaknesses…

Page 53

> The study of general artificial intelligence is a field still in its infancy, and we do not wish to convey the impression that we have provided a definitive solution to the problem of characterizing and measuring the intelligence held by an AI system.

Page 56


It's in the OpenAI charter...


100% - the quality group only had one chance to impress the teacher, whereas quantity group had dozens. The conclusion drawn from this in the text seems to be based on assumptions. We don't actually know how many intermediate photographs the quality group took as well, and without knowing that and also checking the quality of those, it's hard to say anything useful.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: