Startup I'm at has generated a LOT of content using LLMs and once you've reviewed enough of the output, you can easily see specific patterns in the output.
Some words/phrases that, by default, it overuses: "dive into", "delve into", "the world of", and others.
You correct it with instructions, but it will then find synonyms so there is also a structural pattern to the output that it favors by default. For example, if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
Yes, all of this can be corrected if you put enough effort into the prompt and enough iterations to fix all of these tells.
> if we tell it "Don't start your writing with 'dive into'", it will just switch to "delve into" or another synonym.
LLMs can radically change their style, you just have to specify what style you want. I mean, if you prompt it to "write in the style of an angry Charles Bukowski" you'll stop seeing those patterns you're used to.
In my team for a while we had a bot generating meeting notes "in the style of a bored teenager", and (besides being hilarious) the results were very unlike typical AI "delvish".
Of course the "delve into" and "dive into" is just its default to be corrected with additional instruction. But once you do something like "write in the style of...", then it has its own tells because as I noted below, it is, in the end, biased towards frequency.
Of course there will be a set of tells for any given style, but the space of possibilities is much larger than what a person could recognize. So as with most LLM tasks, the issue is figuring out how to describe specifically what you want.
Aside: not about you specifically, but I feel like complaints on HN about using LLMs often boil down to somebody saying "it doesn't do X", where X is a thing they didn't ask the the model to do. E.g. a thread about "I asked for a Sherlock Holmes story but the output wasn't narrated by Watson" was one that stuck in my mind. You wouldn't think engineers would make mistakes like that, but I guess people haven't really sussed out how to think about LLMs yet.
Anyway for problems like what you described, one has to be wary about expecting the LLM to follow unstated requirements. I mean, if you just tell it not to say "dive into" and it doesn't, then it's done everything it was asked, after all.
I mean, we get it. It's a UX problem. But the thing is you have to tell it exactly what to do every time. Very often, it'll do what you said but not what you meant, and you have to wrestle with it.
You'd have to come up with a pretty exhaustive list of tells. Even sentence structure and mood is sometimes enough, not just the obvious words.
This is the way. Blending two or more styles also works well, especially if they're on opposite poles, e.g. "write like the imaginary lovechild of Cormac McCarthy and Ernest Hemingway."
Also, wouldn't angry Charles Bukowski just be ... Charles Bukowski?
> ...once you've reviewed enough of the output, you can easily see specific patterns in the output
That is true, but more importantly, are those patterns sufficient to distinguish between AI-generated content from human-generated content? Humans express themselves very differently by region and country ( e.g. "do the needful" in not common in the midwest, "orthogonal" and "order of magnitude" are used more on HN than most other places). Outside of watermaking, detecting AI-generated text is with an acceptably small false-positive error rate is nearly impossible.
Not sure why you default to an uncharitable mode in understanding what I am trying to say.
I didn't say they know their own tells. I said they naturally output them for you. Maybe the obvious is so obvious I don't need to comment on it. Meaning this whole "tells analysis" would necessarily rely on synthetic data sets.
I always assumed that they were snake oil because the training objective is to get a model that writes like a human. AI detectors by definition are showing what does not sound like a human, so presumably people will train the models against the detectors until they no longer provide any signal.
The thing is, the LLM has a flaw: it is still fundamentally biased towards frequency.
AI detectors generally can take advantage of this and look for abnormal patterns in frequencies of specific words, phrases, or even specific grammatical constructs because the LLM -- by default -- is biased that way.
I'm not saying this is easy and certainly, LLMs can be tuned in many ways via instructions, context, and fine-tuning to mask this.
They're not very accurate, but I think snake oil is a bit too far - they're better than guessing at least for the specific model(s) they're trained on. OpenAI's classifier [0] was at 26% recall, 91% precision when it launched, though I don't know what models created the positives in their test set. (Of course they later withdrew that classifier due to its low accuracy, which I think was the right move. When a company offers both an AI Writer and an AI Writing detector people are going to take its predictions as gospel and _that_ is definitely a problem.)
All that aside, most models have had a fairly distinctive writing style, particularly when fed no or the same system prompt every time. If o1-Pro blends in more with human writing that's certainly... interesting.
Anecdotally, English/History/Communications professors are confirming cheaters with them because they find it easy to identify false information. The red flags are so obvious that the checker tools are just a formality: student papers now have fake URLs and fake citations. Students will boldly submit college papers which have paragraphs about nonexistent characters, or make false claims about what characters did in a story.
The e-mail correspondence goes like this: "Hello Professor, I'd like to meet to discuss my failing grade. I didn't know that using ChatGPT was bad, can I have some points back or rewrite my essay?"
Yeah but they "detect" the characteristic AI style: The limited way it structures sentences, the way it lays out arguments, the way it tends to close with an "in conclusion" paragraph, certain word choices, etc. o1-Pro doesn't do any of that. It writes like a human.
Damnit. It's too good. It just saved me ~6 hours in drafting a complicated and bespoke legal document. Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours. Homework is over. Journalism is over. A large slice of the legal profession is over. For real this time.
Journalism is not only about writing. It is about sources, talking to people, being on the ground, connecting dots, asking the right questions. Journalists can certainly benefit from AI and good journalists will have jobs for a long time still.
While the above is true, I'd say the majority of what passes as journalism these days has none of the above and the writing is below what an AI writer could produce :(
It's actually surprising how many articles on 'respected' news websites have typos. You'd think there would be automated spellcheckers and at least one 'peer review' (probably too much to ask an actual editor to review the article these days...).
Mainstream news today is written for an 8th grade reading ability. Many adults would lose interest otherwise, and the generation that grew up reading little more than social media posts will be even worse.
AI can handle that sort of writing just fine, readers won't care about the formulaic writing style.
So AI could actually turn journalism more into what it originally was: reporting what is going on, rather than reading and rewriting information from other sources. Interesting possibility.
Is exactly the key element in being able to use spicy autocomplete. If you don't know what you're doing, it's going to bite you and you won't know it until it's too late. "GPT messed up the contract" is not an argument I would envy anyone presenting in court or to their employer. :)
> Before you ask: I know what I'm doing, and it did a better job in five minutes than I could have done over those six hours.
Seems like lawyers could do more faster because they know what they are doing. Experts dont get replaced, they get tools to amplify and extend their expertise
Replacement doesn't happen only if the demand for their services scales proportional to the productivity improvements, which is true sometimes but not always true, and is less likely to be true if the productivity improvements are very large.
It still needs to be driven by someone who knows what they're doing.
Just like when software that was coming out, it may have ended jobs.
But it also helped get things done that wouldn't otherwise, or as much.
In this case, equipping a capable lawyer to be 20x is more like an iron man suit, which is OK. If you can get more done, wit less effort, you are still critical to what's needed.