Hacker Newsnew | past | comments | ask | show | jobs | submit | midlightdenight's commentslogin

The model of GPT-4 those researchers had was not the same that’s available to the public. It’s assumed it was far more capable before alignment training (or whatever it’s called).


That's convenient. Typically one of the markers of good science is reproducability. How can we trust any of the information coming out of these studies if it can't be reproduced?


Also - why not make this clear in all the model access documents? Perhaps call it GPT-4P (for Public?)

Perhaps also provide other researchers with vetted access. There are a lot of groups trying to evaluate these things systematically - for example "Faith and Fate", "Jumbled thoughts","Emergent abilities are a Mirage" were all very good papers published this year which really highlighted hype in LLM evaluation.

Everyone can see that modern LLM's have some great capabilities, the flexibility you can get in an interface by doing intent detection and categorizations using an LLM is great and it is so much easier and quicker than using previous techniques. It's more expensive, but that's improving rapidly. I firmly believe that a new era of great new systems with better interfaces and more functionality will be built on LLM's and other models from this wave of Big Data / Big Model AI, but these are not the precursors of AGI.

The problem with the looky looky AGI bunkum show is that it's pulling money into crappy projects that are going to fail hard and this will then stop a lot of money going into projects that could be successful fast. I am seeing the shape of the dotcom boom/bust in what's happening. Microsoft and Intel used dotcom to build and maintain their monopoly position, I think AWS, MS and Google will do the same this time. I think we will see a wave of new companies like Amazon that will "fail" some of them will really fail and disappear, some will half fail like Sun did, but some will go on to build monopolies anew. In the meantime the technology will evolve not for the greater good but instead to serve purposes like advertising distribution that are trivial compared to the benefit we could have seen. Over all we will not capitalise on the potential of what we have for several decades, ironically because of the failures of capitalism. Children will die, wars will be fought but some of us will have nice sweat pants and fun playing paddleball in the sunshine while it all happens.

When historians write this up in 100 years they won't really see any of this - they will just see a huge surge of innovation. The dead have no voices...


The following snippet from the article seems important in case anyone missed it:

> Still, the save-the-bees narrative persists. Its longevity stems from confusion about what kind of bees actually need to be rescued. There are more than 20,000 species of wild bees in the world, and many people don’t realize they exist. That’s because they don’t produce honey and live all but invisibly, in ground nests and cavities like hollow tree trunks. But they are indispensable pollinators of plants, flowers and crops. Researchers have found that many species of wild bees are, in fact, declining.


Totally irresponsible that they condemn "save the bees" as a misunderstanding, but barely mention the many species of bees that are critically endangered. The phrase isn't "save the honeybees." It's still a valid issue, even if honeybees aren't.


Impressive work, but I think the title is misleading. Saying it is “near GPT-4” tends to imply that it outperforms ChatGPT (3.5). It does outperform it on a handful of tasks, but overall is slightly worse.

That aside I think this is really cool and hopefully we keep seeing this kind of improvement on smaller models.

I’m also curious if we know how many parameters the current model of ChatGPT 3.5 has? The API is really cheap, which makes me think it has less than the 175b in the larger GPT-3.


Oh sorry I didn’t look properly then. I thought it outperformed GPT-3.5 consistently. My bad. Thanks for the correction.


I’m not sure the training date cutoff or prompt weighting says anything about whether this is hallucinated or not.

The models have been given these rules in the present, this is known, so training data cutoff doesn’t matter as the model has now seen this. Zero shot learning in gpt4 is not new. This also answers that these are prompts (I’m not sure what your point is here).

We still don’t know if the model took these rules and hallucinated from them or regurgitated them. Only the people with access know that.

We also don’t know if there’s been some fine tuning.

Some of the rules being posted are a bit off though. For example in the original post some of the “must” words are capitalized and others are not. This begs the questions why some, did the prompter find that capitalizing specific words has more weight or does it confuse the LLM, or did the LLM just do zero shot off the original rules and hallucinate something similar?

I’d bet these are hallucinated but similar to the real rules.

Has anyone shown you can get gpt4 to regurgitate the system prompt (using the api) exactly? Using a system prompt similar that dictates no sharing the prompt etc.

That would give a better indication than this imo.


There’s probably some places where it’s used where it shouldn’t be, but there is such a thing as “emergent”.

I think it helps to see an easy example first.

Take a great body of water (h2o) on earth, such as the ocean. That water has waves and those waves have amplitudes, as well as other properties such as crests.

A molecule (or even a few) of water (h2o) doesn’t really have these properties. It doesn’t have crests. You could argue it has amplitudes. Crests are an emergent property of large masses of water molecules within a specific system. There’s no crest to a water molecule.

Similarly, for demonstration purposes we could say there’s no waves/crests when gravity is removed.

So when I hear emergent, I imagine properties that show up under certain conditions within a system, that were not present in individual components.


There’s a whole lot of “…” between a water molecule and a wave. There is nothing emergent about it. “Emergent” is a bullshit answer.


You seem to have a different definition of "emergent" to the rest of the world. Just because there _is_ an explanation for waves doesn't stop them from being an emergent behaviour.


Why not just require users to allow actions?

It doesn’t have to be integrated into the LLM at that point. If an Email has hidden text “do X”, which triggers the LLM to try to “do X”, but all post/push APIs have a user verification on them before they’re sent.

Sure it could get messy when the LLM tries to summarize the “why” on that action, but this is fairly similar to where we are now with phishing and uneducated individuals.

It’s also unlikely these LLMs have unbounded actions they could take. Specific ones like “send email to all recipients” could easily be classified as dangerous. You don’t even need an LLM to classify that.

I sometimes think we forget there’s glue between the LLM and the internet, and that glue can be useful for security purposes.


They’re currently hiring iOS and Android developers so it’s likely on the horizon.

https://openai.com/careers/search


Interesting. I asked gpt3.5 and gpt4 this twice and both answered correctly each time.

I wouldn’t be surprised if gpt3.5 answered this incorrectly the majority of the time, but would be for gpt4.


This has always made me curious about multi modality and especially Google’s Palm. At least how google presents Palm in diagrams.

The way I’ve interpreted the scaling hypothesis is that we will see emergent intelligence with larger nets through automated training alone. If we want a model to learn images we throw a larger net at it with training data.

The way I’ve interpreted some of these newer techniques to multi modality is that they are stitched together models. If we want a model to learn images we decide this and teach one model images and then connect it to the core model. There’s not a lot of emergent behavior due to scaling in this scenario.

With that perception I don’t see how gpt4 says anything about the scaling hypothesis. However I am not in this field and would be grateful to learn more.


The brain is composed of distinct regions that specialise in specific tasks. It's reasonable to assume AGI would be the same.

So the goal should be: we've created a "language module" (LLMs) and a "visual perception module" (computer vision), but we also need to add a "logic module", a "reasoning module", an "empathy module", etc, while continuing to improve each.

I just don't see how you could get an LLM, no matter how advanced, to recognize a car. Even if it can describe cars (wheels, windshield, doors) it doesn't know what any of those components look like. It's like that old joke about philosophers being unable to define a chair beyond "I'll know it when I see it".


it's even more foundamental than that, the brain is constantly learning and retaining new knowledge and solve problems with a mix of old knowledge and knowledge learned in the context of the problem, either by experimentation or research

these net can replicate at most the first step for now, even tuning by reingofrcement learning is more of a set up batch than an ongoing thing, and certainly not something they will be able to do in the context of a single problem but as part of a retraining

agi is still a fair bit away, I'm unsure if these super large architecture will ever get to replicate the second part of our brain, the flexibility while on the job, because of their intrinsic training mechanism.


I’m not sure if dreaming is the right word here, but the recent ML model Dreamer v3 [0] “dreams”. This is more akin to thinking ahead since it’s really just predicting outcomes using its world model.

It’s possible they could try training the model on some hallucinated scenarios to avoid over fitting, but I’m no ML researcher.

[0] https://danijar.com/project/dreamerv3/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: