Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

First impression of GPT-4.5:

1. It is very very slow, for some applications where you want real time interactions is just not viable, the text attached below took 7s to generate with 4o, but 46s with GPT4.5

2. The style it writes is way better: it keeps the tone you ask and makes better improvements on the flow. One of my biggest complaints with 4o is that you want for your content to be more casual and accessible but GPT / DeepSeek wants to write like Shakespeare did.

Some comparisons on a book draft: GPT4o (left) and GPT4.5 (green). I also adjusted the spacing around the paragraphs, to better diff match. I still am wary of using ChatGPT to help me write, even with GPT 4.5, but the improvement is very noticeable.

https://i.imgur.com/ogalyE0.png



In my experience, Gemini Flash has been the best at writing, and GPT 3.5 onwards has been terrible.

GPT-3 and GPT-2 were actually remarkably good at it, arguably better than a skilled human. I had a bit of fun ghostwriting with these and got a little fan base for a while.

It seems that GPT-4.5 is better than 4 but it's nowhere near the quality of GPT-3 davinci. Davinci-002 has been nerfed quite a bit, but in the end it's $2/MTok for higher quality output.

It's clear this is something users want, but OpenAI and Anthropic seem to be going in the opposite direction.


>1. It is very very slow, ... below took 7s to generate with 4o, but 46s with GPT4.5

This is positively luxurious by o1-pro standards which I'd say average 5 minutes. That said I totally agree even ~45s isn't viable for real-time interactions. I'm sure it'll be optimized.

Of course, my comparing it to the highest-end CoT model in [publicly-known] existence isn't entirely fair since they're sort of apples and oranges.


I paid for pro to try `o1-pro` and I can't seem to find any use case to justify the insane inference time. `o3-mini-high` seems to do just as well in seconds vs. minutes.


What are you doing with it? For me deep research tasks are where 5 minutes is fine, or something really hard that would take me way more time myself.


I usually throw a lot of context at it and have it write unit tests in a certain style or implement something (with tests) according to a spec.

But the o3-mini-high results have been just as good.

I am fine with Deep Research taking 5-8 minutes, those are usually "reports" I can read whenever.


I bet I can generate unit tests just as fast and for a fraction of the cost, and probably less typing, with a couple vim macros


Idk, it is pretty good a generating synthetic data and recognizing the different logic branches to exercise. Not perfect, but very helpful.


I'm wondering if generative AI will ultimately result in a very dense / bullet form style of writing. What we are doing now is effectively this:

bullet_points' = compress(expand(bullet_points))

We are impressed by lots of text so must expand via LLM in order to impress the reader. Since the reader doesn't have time or interest to read the content they must compress it back into bullet points / quick summary. Really, the original bullet points plus a bit more thinking would likely be a better form of communication.



That’s what Axios does. For ordinary events coverage, it’s a great style.


Right side, by a large margin. Better word choice and more natural flow. It feels a lot more human.


Is there really no way to prompt GPT4o to use a more natural and informal tone matching GPT4.5's?


I opened your link in a new tab and looked at it a couple minutes later. By then I forgot which was o and which was .5

I honestly couldn't decide which I prefer


I definitely prefer the 4.5, but that might just be because it sounds 'less like ChatGPT', ironically.


It just feels natural to me. The person knows the language but they are not trying to sound smart by using words that might have more impact "based on the words dictionary definition"

GPT 4.5 does feel like it is a step forward in producing natural language, and if they use it to provide reinforcement learning, this might have significant impact in the future smaller models.


Imgur might be the worst image hosting site I’ve ever experienced. Any interaction with that page results in switching images and big ads and they hijack the back button. Absolutely terrible. How far they’ve fallen from when it first began.


>One of my biggest complaints with 4o is that you want for your content to be more casual and accessible but GPT / DeepSeek wants to write like Shakespeare did.

Well, maybe like a Sophomore's bumbling attempt to write like Shakespeare.


Similar reaction here. I will also note that it seems to know a lot more about me than previous models. I’m not sure if this is a broader web crawl, more space in the model, or more summarization of our chats or a combination, but I asked it to psychoanalyze a problem I’m having in the style of Jacques lacan and it was genuinely helpful and interesting, no interview required first; it just went right at me.

To borrow an iain banks word, the “fragre” def feels improved to me. I think I will prefer it to o1 pro, although I haven’t really hammered on it yet.


How do the two versions match so closely? They have the same content in each paragraph, just worded slightly differently. I wouldn't expect them to write paragraphs that match in size and position like that.


If you use the "retry" functionality in ChatGPT enough, you will notice this happens basically all the time.


Honestly, feels like a second LLM just reworded the response on the left-side to generate the right-side response.


What’s the deal with Imgur taking ages to load? Anyone else have this issue in Australia? I just get the grey background with no content loaded for 10+ seconds every time I visit that bloated website.


This website sucks but successfully loaded from Aus rn on my phone. It's full of ads - possibly your ad blocker is killing it?


Ok for me here in aus


I use 4o mostly in German, so YMMV. However, I find a simple prompt controls the tone very well. "This should be informal and friendly", or "this should be formal and business-like".


> It is very very slow

Could that be partially due to a big spike in demand at launch?


Possibly, repeating the prompt I got a much higher speed, taking 20s on average now, which is much more viable. But that remains to be seen when more people start using this version in production.


Thank you. This is the best example of comparison I have seen so far.


How does it compare with o1 and o3 preview?


o3 is okay for text checking but has issues following the prompt correctly, same as o1 and DeepSeek R1, I feel that I need to prompt smaller snippets with them.

Here is the o3 vs a new run of the same text in GPT 4.5

https://www.diffchecker.com/ZEUQ92u7/


Thanks, though it says o1 on the page, is that a typo?


Oh yeah, that right side version is WAY better, and sounds much more like a human.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: