Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I get the feeling, but that's not what this is.

NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

That's a question they fundamentally cannot answer without these chat logs.

That's what discovery, especially in a copyright case, is about.

Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses. They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

And the reason this evidence is relevant is it will directly feed into how much money NYT and OpenAI will ultimately settle for. If this never happens then the amount will be low. If it happens a lot the amount will be high. And if it goes to trial it will be used in the damages portion assuming NYT wins.

The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.



>That's what NYTimes lawyers are after. They want the chat logs so they can do their own searches to find NYTimes text within the responses.

The trouble with this logic is NYT already made that argument and lost as applied to an original discovery scope of 1.4 billion records. The question now is about a lower scope and about the means of review, and proposed processes for anonymization.

They have a right to some form of discovery, but not to a blank check extrapolation that sidesteps legitimate privacy issues raised both in OpenAIs statement as well as throughout this thread.


Again, as I pointed out to you numerous times in this thread. OpenAI already represented to the court that the data was anonymized and that they can anonymize it, so you are significantly departing from the actual facts in your discussion here. There are no genuine privacy issues left here. The data is anonymous and it is under a protective order so it must be maintained confidentially.


> The user has no right to privacy

The correct term for this is prima facie right.

You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.

Similarly, liberty is a prima facie right; you can be arrested for committing a crime.


> enforcing the rights of others under copyright law

I certainly do not care about copyright more than my own privacy, and I certainly don't find that interest to be the public's interest, though perhaps it's the interest of legacy corporations and their lobbyists.


> You do have a right to privacy (arguably) but it is outweighed by the interest of enforcing the rights of others under copyright law.

What governs or codifies that? I would have expected that there would need to be some kind of specific overriding concern(s) that would need to apply in order to violate my (even limited) expectation of privacy, not just enforcing copyright law in general.

E.g. there's nothing resembling "probable cause" to search my own interactions with ChatGPT for such violations. On what basis can that be justified?


Is there any evaluation of which right or which harm is larger? It seems like the idea that one outweighs another is arbitrary. Is there a principled thing behind it?


That's what the court is for. Weighing the different arguments and applying precedents


Seems to me my right to privacy is far more important than their right to copyright enforcement.


Have you read OpenAI's terms of service? Which part is being violated by producing anonymized logs in response to discovery? OpenAI's ToS state that they will produce your data in response to discovery. What's not clicking for you?


> NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content. The question they have to answer is "to what extent has this happened?"

Credible to whom? In their supposed "investigation", they sent a whole page of text and complex pre-prompting and still failed to get the exact content back word for word. Something users would never do anyways.

And that's probably the best they've got as they didn't publish other attempts.


Agreed, they could carefully coerce the model to more or less output some of their articles, but the premise that users were routinely doing this to bypass the paywall is silly.


Especially when you can just copy paste the url into Internet Archive and read it. And yet they aren't suing Internet Archive.


Copyright law isn’t binary and has long-running allowances for fair use which take into consideration factors like scale, revenue, and whether it replaces the original. As a real non-profit, the Internet Archive is not selling its copies of the NYT and it’s always giving full credit to the source. In contrast, ChatGPT does charge for their output and while it may give citations that’s not a given.


Let's be real, they are suing OpenAI because they have way more money than the Internet Archive and they would be happy with a cut


>NYTimes has produced credible evidence that OpenAI is simply stealing and republishing their content

They shouldnt have any rights to data after its released.

>That's a question they fundamentally cannot answer without these chat logs.

They are causing more damage than anything chatGPT could have caused to NYT. Privacy needs to be held higher than corporate privilege.

>Think about it this way. Let's say this were a book store selling illegal copies of books.

Think of it this way, no book should be illegal.

>They can't know how often that's happened and OpenAI has an obvious incentive to simply say "Oh that never happened".

NYT glazers do more to uphold OpenAI as a privacy respecting platform than OpenAI has ever done.

>If this never happens then the amount will be low.

Should be zero, plus compensation to the affected OpenAI users from NYT.

>The user has no right to privacy.

And this needs to be remedied immediately.

>The same as how any internet service can be (and have been) compelled to produce private messages.

And this needs to be remedied immediately.


I get that you're mad, and rightly should be for an invasion of your privacy, but the NYT would be foolish to use any of your data for anything other than this lawsuit, and to not delete it afterwards, as per their request.

They can't use this data against any individual, even if they explicitly asked, "How do I hack the NYT?"

The only potential issue is them finding something juicy in someone's chat, that they could publish as a story; and then claiming they found out about this juicy story through other means, (such as a confidential informant), but that's not likely an issue for the average punter to be concerned about.


>The only potential issue is them finding something juicy in someone's chat, that they could publish as a story; and then claiming they found out about this juicy story through other means, (such as a confidential informant)

Which is concerning since this is a news organization that's getting the data.

Let's say they do find some juicy detail and use it, then what? Nothing. It's not like you can ever fix a privacy violation. Nobody involved would get a serious punishment, like prison time, either.


>Let's say they do find some juicy detail and use it, then what? Nothing. It's not like you can ever fix a privacy violation. Nobody involved would get a serious punishment, like prison time, either.

There are no privacy violations. OpenAI already told the court they anonymized it. What they say in court and what they say in the blog is different and so many people here are (unfortunately) falling for it!


There's no such thing. Anonymized data can still be used to identify someone as we've seen on numerous occasions.


Read the ToS next time


> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

The legal term is "expectation of privacy", and it does exist, albeit increasingly weakly in the US. There are exceptions to that, such as a subpoena, but that doesn't mean anyone can subpoena anything for any reason. There has to be a legal justification.

It's not clear to me that such a justification exists in this case.


That's why there is someone trained in the law (the judge) to make that determination.


It's not credible. Using AI to regurgitate news articles is not a good use of the tool, and it is not credible that any statistically significant portion of their user base is using the tool for that.


> Think about it this way. Let's say this were a book store selling illegal copies of books. A very reasonable discovery request would be "Show me your sales logs". The whole log needs to be produced otherwise you can't really trust that this is the real log.

Your claim doesn’t hold up, my friend. It’s inaccurate because nobody archives an entire dialogue with a seller for the record, and you certainly don’t have to show identification to purchase a book.


Even if OpenAI is reproducing pieces of NYT articles, they still have a difficult argument because in no way is is a practical means of accessing paywalled NYT content, especially compared to alternatives. The entire value proposition of the NYT is news coverage, and probably 99.9% of their page views are from stories posted so recently that they aren't even in the training set of LLMs yet. If I want to reproduce a NYT story from LLM it's a prompt engineering mess, and I can only get old ones. On the other hand I can read any NYT story from today by archiving it: https://archive.is/5iVIE. So why is the NYT suing OpenAI and not the Internet Archive?


OpenAI is not allowed to reproduce the NYT's articles, that's copyright infringement. It does not really matter if it is a practical thing or not, that would only go to damages, not liability.


What do you think it is you are liable for?


I'm confused. I don't think I'm liable for anything. I am not OpenAI.


You don't hate the media nearly enough.

"Credible" my ass. They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles. OpenAI has taken measures to limit such methods and prevent arbitrary wholesale reproduction of copyrighted content since that time. That would have been the end of the situation if NYT was engaging in good faith.

The NYT is after what they consider "their" piece of the pie. They want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior. They haven't been injured, they were already dying, and this lawsuit is a hail mary attempt at grifting some life support.

Behavior like that of the NYT is why we can't have nice things. They're not entitled to exist, and by engaging in behavior like this, it makes me want them to stop existing, the faster, the better.

Copyright law is what you get when a bunch of layers figure out how to encode monetization of IP rights into the legal system, having paid legislators off over decades, such that the people that make the most money off of copyrights are effectively hoarding those copyrights and never actually produce anything or add value to the system. They rentseek, gatekeep, and viciously drive off any attempts at reform or competition. Institutions that once produced valuable content instead coast on the efforts of their predecessors, and invest proceeds into lawsuits, lobbying, and purchase of more IP.

They - the NYT - are exploiting a finely tuned and deliberately crafted set of laws meant to screw actual producers out of percentages. I'm not a huge OpenAI fan, but IP laws are a whole different level of corrupt stupidity at the societal scale. It's gotcha games all the way down, and we should absolutely and ruthlessly burn down that system of rules and salt the ground over it. There are trivially better systems that can be explained in a single paragraph, instead of requiring books worth of legal code and complexities.


I'm not a fan of NYT either, but this feels like you're stretching for your conclusion:

> They hired "experts" who used prompt engineering and thousands of repetitions to find highly unusual and specific methods of eliciting text from training data that matched their articles....would have been the end of the situation if NYT was engaging in good faith.

I mean, if I was performing a bunch of investigative work and my publication was considered the source of truth in a great deal of journalistic effort and publication of information, and somebody just stole my newspaper off the back of a delivery truck every day and started rewriting my articles, and then suddenly nobody read my paper anymore because they could just ask chatgpt for free, that's a loss for everyone, right?

Even if I disagree with how they editorialize, the Times still does a hell of a lot of journalism, and chatgpt can never, and will never be able to actually do journalism.

> they want to insert themselves as middlemen - pure rent seeking, second hander, sleazy lawyer behavior

I'd love to hear exactly what you mean by this.

Between what and what are they trying to insert themselves as middlemen, and why is chatgpt the victim in their attempts to do it?

What does 'rent seeking' mean in this context?

What does 'second hander' mean?

I'm guessing that 'sleazy lawyer' is added as an intensifier, but I'm curious if it means something more specific than that as well, I suppose.

> Copyright law....the rest of it

Yeah. IP rights and laws are fucked basically everywhere. I'm not smart enough to think of ways to fix it, though. If you've got some viable ideas, let's go fix it. Until then, the Times kinda need to work with what we've got. Otherwise, OpenAI is going to keep taking their lunch money, along with every other journalist's on the internet, until there's no lunch money to be had from anyone.


> my publication was considered the source of truth

Their publication is not considered the source of truth, at least not by anyone with a brain.


They are still considered a paper of record, but I chose to use a hypothetical outfit because I don’t love the Times myself but I believe the argument to be valid.

I’m not interested in arguing about whether or not they deserve to fail, because that whole discussion is orthogonal to whether OpenAI is in the wrong.

If I’m on my deathbed, and somebody tries to smother me, I still hope they face consequences


> then suddenly nobody read my paper anymore

This is the part that Times won't talk about because people stopped reading their paper long before AI, and they haven't been able to point to any credible harm in terms of reduced readership as a result of open AI launching. They just think that people might be using ChatGPT to read the New York Times without paying. But it's not a very good hypothesis because that's not what ChatGPT is good at.

It's like the people filing the lawsuit don't really understand the technology at all.


> The user has no right to privacy. The same as how any internet service can be (and have been) compelled to produce private messages.

This is nonsense. I’ve personally been involved in these things, and fought to protect user privacy at all levels and never lost.


You've successfully fought a subpoena on the basis of a third party's privacy? More than once? I'd love to hear more.


I was CEO of a small startup called Network54 with about 4 million monthly users. It was a forum hosting service.

The early 2000s were the heyday of lawsuits. People would say something about someone and if that someone was rich they would sue. It happened often.

The attorneys would sue us, the domain registrar, the ISP, everyone.

Often the things said were true. But they would sue to find out who the people were.

People selling Ponzi schemes, CEOs of public companies trying to find what union employees to fire, it was all over the place.

We would fire to quash every time. File to move venues to CA which has anti-slap laws. Depositions in DC. It was very distracting and expensive.

Never lost. Made some people really mad that they didn’t get their way.

Now for criminal things, the opposite, sorry. Two person operation and the FBI walks in your office with a warrant, then yes sir let me see the warrant first. If no warrant, then sorry sir come back with a warrant but we will take this as a notice to soft delete not hard delete content.


Ah interesting, thanks for answering.

I've been in the situation of being instructed to pull unredacted logs for a subpoena before when I really did not think it was appropriate. I was just an IC but I talked to a lawyer about it. Since the company I worked for was not willing to fight it, my options were pull the logs, quit the job, or possibly catch a contempt charge.

It seems like everyone who is not the CEO or maybe the legal dept has much more constrained choices in this situation. I also wonder if the timeframes matter here, how much things may have changed in two decades. My experience with it was only a couple years ago, and I was surprised they chose not to fight it but presumably they know more about the chances of success than I do.


Yahoo got sued for not fighting it long enough to give a chance for the third party to quash on their own. If I remember correctly, they lost. But the case had a good argument of fairness to the little people whose data is just being given away and people fired or harassed because of it.

Anyhow, we worked with Public Citizen on a couple of cases and they were willing to fund to Supreme Court in order to set good precedent.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: