This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.
Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.
That's not a problem. OpenAI need to get some cash from its product because the competition is intense from free models. Moreover, since they supposedly used most of the web content and pirated whatever else they could, improvements in training will likely be only incremental.
All the while, after the wow effect passed, more people start to realize the flaw in generative AI. So current hype, like all hype, as a limited shelf life and companies need to cash out now because it could be never.
A con? It's not that $200 is a con, their whole existence is a con.
They're bleeding money and are desperately looking for a business model to survive. It's not going very well. Zitron[1] (among others) has outlined this.
> OpenAI's monthly revenue hit $300 million in August, and the company expects to make $3.7 billion in revenue this year (the company will, as mentioned, lose $5 billion anyway), yet the company says that it expects to make $11.6 billion in 2025 and $100 billion by 2029, a statement so egregious that I am surprised it's not some kind of financial crime to say it out loud. […] At present, OpenAI makes $225 million a month — $2.7 billion a year — by selling premium subscriptions to ChatGPT. To hit a revenue target of $11.6 billion in 2025, OpenAI would need to increase revenue from ChatGPT customers by 310%.[1]
They haven’t raised the price, they have added new models to the existing tier with better performance at the same price.
They have also added a new, even higher performance model which can leverage test time compute to scale performance if you want to pay for that GPU time. This is no different than AWS offering some larger ec2 instance tier with more resources and a higher price tag than existing tiers.
Roughly 10 million ChatGPT users pay the company a $20 monthly fee, according to the documents. OpenAI expects to raise that price by $2 by the end of the year, and will aggressively raise it to $44 over the next five years, the documents said.
We'll have to see if the first bump to $22 this year ends up happening.
You're technically right. New models will likely be incremental upgrades at a hefty premium. But considering the money they're loosing, this pricing likely better reflects their costs.
They're throwing products at the wall to see what sticks. They're trying to rapidly morph from a research company into a product company.
Models are becoming a commodity. It's game theory. Every second place company (eg. Meta) or nation (eg. China) is open sourcing its models to destroy value that might accrete to the competition. China alone has contributed a ton of SOTA and novel foundation models (eg. Hunyuan).
AI may be over hyped and it may have flaws (I think it is both)... but it may also be totally worth $200 / month to many people. My brother is getting way more value than that out of it for instance.
So the question is it worth $200/month and to how many people, not is it over hyped, or if it has flaws. And does that support the level of investment being placed into these tools.
Models are about to become a commodity across the spectrum: LLMs [1], image generators [2], video generators [3], world model generators [4].
The thing that matters is product.
[1] Llama, QwQ, Mistral, ...
[2] Nobody talks about Dall-E anymore. It's Flux, Stable Diffusion, etc.
[3] HunYuan beats Sora, RunwayML, Kling, and Hailuo, and it's open source and compatible with ComfyUI workflows. Other companies are trying to open source their models with no sign of a business model: LTX, Genmo, Rhymes, et al.
[4] The research on world models is expansive and there are lots of open source models and weights in the space.
A better way to express it than a "con" is that it's a price-framing device. It's like listing a watch at an initial value of $2,000 so that people will feel content to buy it at $400.
The line between ‘con’ and ‘genuine value synthesised in the eye of the buyer using nothing but marketing’ is very thin. If people are happy, they are happy.
Few days ago I had issue with IPsec VPN behind NAT. I spend few hours Googling around, tinkering with system, I had some rough understanding of what goes wrong, but not much and I had no idea how to solve this issue.
I made a very exhaustive question to ChatGPT o1-preview, including all information I though is relevant. Something like good forums question. Well, 10 seconds later it spew me a working solution. I was ashamed, because I have 20 years of experience under my belt and this model solved non-trivial task much better than me.
I was ashamed but at the same time that's a superpower. And I'm ready to pay $200 to get solid answers that I just can't get in a reasonable timeframe.
It is really great when it works, but challenge is I've had it sometimes not understanding a detailed programming question and it confidently gives an incorrect answer. Going back and forth a few times ends up clear it really does not know answer, but I end up going in circles. I know LLMs can't really tell you "sorry I don't know this one", but I wish they could.
The exhaustive question makes ChatGPT reconstruct your answer in real-time, while all you need to do is sleep; your brain will construct the answer and deliver it tomorrow morning.
The benefit of getting an answer immediately rather than tomorrow morning is why people are sometimes paid more for on-call rates rather than everyone being 9-5.
(Now I think if of the idiom, when did we switch to 9-6? I've never had a 9-5).
I bet users won't pay for the power, but for a guarantee of access! I always hear about people running out of compute time for ChatGPT. Obvious answer is charge more for a higher quality service.
Imo the con is picking the metric that makes others look artificially bad when it doesn't seem to be all that different (at least on the surface)
> we use a stricter evaluation setting: a model is only considered to solve a question if it gets the answer right in four out of four attempts ("4/4 reliability"), not just one
This surely makes the other models post smaller numbers. I'd be curious how it stacks up if doing eg 1/1 attempt or 1/4 attempts.
As someone who has both repeatedly written that I value the better LLMs as if they were a paid intern (so €$£1000/month at least), and yet who gets so much from the free tier* that I won't bother paying for a subscription:
I've seen quite a few cases where expensive non-functional things that experts demonstrate don't work, keep making money.
My mum was very fond of homeopathic pills and Bach flower tinctures, for example.
* 3.5 was competent enough to write a WebUI for the API so I've got the fancy stuff anyway as PAYG when I want it.
Does Apple charge a premium? Of course. Do Apple products also tend to have better construction, greater reliability, consistent repair support, and hold their resale value better? Yes.
The idea that people are buying Apple because of the Apple premium simply doesn't hold up to any scrutiny. It's demonstrably not a Verblen good.
Now that is a trope when you're talking about Apple. They may use more premium materials that and have a degree of improved construction leveraging those materials - but at the end of the day there are countless numbers of failure prone designs that Apple continued to ship for years even after knowing they existed.
I guess I don't follow the fact that the "Apple Premium" (whether real or otherwise) isn't a factor in a buyer decision. Are you saying Apple is a great lock-in system and that's why people continue to buy from them?
I suspect they're saying that for a lot of us, Apple provides enough value compared to the competition that we buy them despite the premium prices (and, on iOS, the lock-in).
It's very hard to explain to people who haven't dug into macOS that it's a great system for power users, for example, especially because it's not very customizable in terms of aesthetics, and there are always things you can point to about its out-of-the-box experience that seem "worse" than competitors (e.g., window management). And there's no one thing I can really point to and say "that, that's why I stay here"; it's more a collection of little things. The service menu. The customizable global keyboard shortcuts. Automator, AppleScript (in spite of itself), now the Shortcuts app.
And, sure, they tend to push their hardware in some ways, not always wisely. Nobody asked for the world's thinnest, most fragile keyboards, nor did we want them to spend five or six years fiddling with it and going "We think we have it now!" (Narrator: they did not.) But I really do like how solid my M1 MacBook Air feels. I really appreciate having a 2880x1800 resolution display with the P3 color gamut. It's a good machine. Even if I could run macOS well on other hardware, I'd still probably prefer running it on this hardware.
Anyway, this is very off topic. That ChatGPT Pro is pretty damn expensive, isn't it? This little conversation branch started as a comparison between it and the "Apple tax", but even as someone who mildly grudgingly pays the Apple tax every few years, the ChatGPT Pro tax is right off the table.
They only have to be consistently better than the competition, and they are, by far. I always look for reviews before buying anything, and even then I've been nothing but disappointed by the likes of Razer, LG, Samsung, etc.
The lack of repairability is easily Apple's worst quality. They do everything in their power to prevent you from repairing devices by yourself or via 3rd party shops. When you take it to them to repair, they often will charge you more than the cost of a new device.
People buy apple devices for a variety of reasons; some people believe in a false heuristic that Apple devices are good for software engineering. Others are simply teenagers who don't want to be the poor kid in school with an Android. Conspicuous consumption is a large part of Apple's appeal.
Here in Brazil Apple is very much all about showing off how rich you are. Especially since we have some of the most expensive Apple products in the world.
Maybe not as true in the US, but reading about the green bubble debacle, it's also a lot about status.
>Whether LLM's help you at your work is extremely domain-dependent.
I really doubt that, actually. The only thing that LLMs are truly good for is to create plausible-sounding text. Everything else, like generating facts, is outside of its main use case and known to frequently fail.
There was a study recently that made it clear the use of LLMs for coding assistance made people feel more productive but actually made them less productive.
I recently slapped 3 different 3 page sql statements and their obscure errors with no line or context references from Redshift into Claude, it was 3 for 3 on telling me where in my query I was messing up. Saved me probably 5 minutes each time but really saved me from moving to a different task and coming back. So around $100 in value right there. I was impressed by it. I wish the query UI I was using just auto-ran it when I got an error. I should code that up as an extension.
When forecasting for developers and employee cost for a company I double their pay but I'm not going to say what I make and if I did or not. I also like to think that developers should be working on work that is many multiples of leverage over their pay to be effective. But thanks.
It didn't cost me anything, my employer paid for it. Math for my employer is odd because our use of LLMs is also R&D (you can look at my profile to see why). But it was definitely worth $1 in api costs. I can see justifying spending $200/month for devs actively using a tool like this.
I am in a similar same boat. Its way more correct than not for the tasks I give it. For simple queries about, say, CLI tools I dont use that often, or regex formulations, I find it handy as when it gives the answer Its easy to test if its right or not. If it gets it wrong, I work with Claude to get to the right answer.
First of all, that's moving the goalposts to next state over, relative to what I replied to.
Secondly, the "No improvement to PR throughput or merge time, 41% more bugs, worse work-life balance" result you quote came, per article, from a "study from Uplevel", which seems to[0] have been testing for change "among developers utilizing Copilot". That may or may not be surprising, but again it's hardly relevant to discussion about SOTA LLMs - it's like evaluating performance of an excavator by giving 1:10 toy excavators models to children and observing whether they dig holes in the sandbox faster than their shovel-equipped friends.
Best LLMs are too slow and/or expensive to use in Copilot fashion just yet. I'm not sure if it's even a good idea - Copilot-like use breaks flow. Instead, the biggest wins coming from LLMs are from discussing problems, generating blocks of code, refactoring, unstructured to structured data conversion, identifying issues from build or debugger output, etc. All of those uses require qualitatively more "intelligence" than Copilot-style, and LLMs like GPT-4o and Claude 3.5 Sonnet deliver (hell, anything past GPT 3.5 delivered).
Thirdly, I have some doubts about the very metrics used. I'll refrain from assuming the study is plain wrong here until I read it (see [0]), but anecdotally, I can tell you that at my last workplace, you likely wouldn't be able to tell whether or not using LLMs the right way (much less Copilot) helped by looking solely at those metrics - almost all PRs were approved by reviewers with minor or tangential commentary (thanks to culture of testing locally first, and not writing shit code in the first place), but then would spend days waiting to be merged due to shit CI system (overloaded to the point of breakage - apparently all the "developer time is more expensive than hardware" talk ends when it comes to adding compute to CI bots).
--
[0] - Per the article you linked; I'm yet to find and read the actual study itself.
LLMs have become indispensable for many attorneys. I know many other professionals that have been able to offload dozens of hours of work per month to ChatGPT and Claude.
Arguably the same problem is occurs in programming: Anything so formulaic and common that an LLM can regurgitate it with a decent level of reliability... is something that ought to have been folded into method/library already.
Or it already exists in some howto documentation, but nobody wanted to skim the documentation.
As a customer of legal work for 20 years, it is also way (way way) faster and cheaper to draft a contract with Claude (total work ~1 hour, even with complex back-and-forth ; you don't want to try to one-shot it in a single prompt) and then pay a law firm their top dollar-per-hour consulting to review/amend the contract (you can get to the final version in a day).
Versus the old way of asking them to write the contract, where they'll blatantly re-use some boilerplate (sometimes the name of a previous client's company will still be in there) and then take 2 weeks to get back to you with Draft #1, charging 10x as much.
That's interesting. I've never had a law firm be straightforward about the (obvious) fact they'll be using a boilerplate.
I've even found that when lawyers send a document for one of my companies, and I give them a list of things to fix, including e.g. typos, the same typos will be in there if we need a similar document a year later for another company (because, well, nobody updated the boilerplate)
Do you ask about the boilerplate before or after you ask for a quote?
I typically don’t ask for a quote upfront since they are very fair with their business and billing practices.
I could definitely see a large law firm (Orrick, Venable, Cooley, Fenwick) doing what you describe. I’ve worked with 2 firms just listed, and their billing practices were ridiculous.
I’ve had a lot more success (quality and price) working with boutique law firms, where your point of contact is always a partner instead of your account permanently being pawned off to an associate.
Email is in profile if you want an intro to the law firm I use. Great boutique firm based in Bay Area and extremely good price/quality/value.
Yeah the industries LLMs will disrupt the most are the ones who gatekeep busywork. SWE falls into this to some degree but other professions are more guilty than us. They dont replace intelligence they just surface jobs which never really required much intelligence to begin with.
A con like that wouldn't last very long.
This is for people who rely enough on ChatGPT Pro features that it becomes worth it. Whether they pay for it because they're freelance, or their employer does.
Just because an LLM doesn't boost your productivity, doesn't mean it doesn't for people in other lines of work. Whether LLM's help you at your work is extremely domain-dependent.