How we’re responding to The NYT’s data demands in order to protect user privacy

> How will you store my data and who can access it?

> The content covered by the court order is stored separately in a secure system. It’s protected under legal hold, meaning it can’t be accessed or used for purposes other than meeting legal obligations.

> Only a small, audited OpenAI legal and security team would be able to access this data as necessary to comply with our legal obligations.

So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

Some of the other language in this post, like repeatedly calling the lawsuit "baseless", really makes this just read like an unconvincing attempt at a spin piece. Nothing to see here.

tptacek · 3 months ago

No, there is a whole news cycle about how chats you delete aren't actually being deleted because of a lawsuit, they essentially have to respond. It's not an attempt to spin the lawsuit; it's about reassuring their customers.

VanTheBrand · 3 months ago

The part where they go out of the way to call the lawsuit baseless is spin though, and mixing that with this messaging does exactly that, presents a mixed message. The NYT lawsuit is objectively not baseless. OpenAI did train on the Times and chat gpt does output information from that training. That’s the basis of the lawsuit. NYT may lose, this could end up being considered fair use, it might ultimately be a flimsy basis for a lawsuit, but to say it’s baseless (and with nothing to back that up) is spin and makes this message less reassuring.

mhitza · 3 months ago

My understanding is that they have to keep chats based on an order, *as a result of their previous accidental deletion of potential evidence in the case*[0].

And per their own terms they likely only delete messages "when they want to" given the big catch-alls. "What happens when you delete a chat? -> It is scheduled for permanent deletion from OpenAI's systems within 30 days, unless: It has already been de-identified and disassociated from your account"[1]

[0] https://techcrunch.com/2024/11/22/openai-accidentally-delete...

[1] https://help.openai.com/en/articles/8809935-how-to-delete-an...

ofjcihen · 3 months ago

They should include the part where the order is a result of them deleting things they shouldn’t have then. You know, if this isn’t spin.

Then again I’m starting to think OpenAI is gathering a cult leader like following where any negative comments will result in devoted followers or those with something to gain immediately jumping to its defense no matter how flimsy the ground.

mmooss · 3 months ago

> It's not an attempt to spin the lawsuit; it's about reassuring their customers.

It can be both. It clearly spins the lawsuit - it doesn't present the NYT's side at all.

conartist6 · 3 months ago

It's hard to reassure your customers if you can't address the elephant in the room. OpenAI brought this on themselves by flaunting copyright law and assuring everyone else that such aggressive and probably-illegal action would be retroactively acceptable once they were too big to fail.

lxgr · 3 months ago

If the stored data is found to be relevant to the lawsuit during discovery, it becomes available to at least both parties involved and the court, as far as I understand.

sashank_1509 · 3 months ago

Obviously openAI’s point of view will be their point of view. They are going to call this lawsuit baseless, they would not be fighting it or else.

ivape · 3 months ago

To me it's pretty clear the way this will happen. You will need to buy additional credits or subscriptions through these LLMs that feedback payment to things like NYT and book publishers. It's all stolen. I don't even want to hear it. This company doesn't want to pay up and willing to let user's privacy hang in the balance to draw the case out until they get sure footing with their device launches or the like (or additional markets like enterprise, etc).

hiddencost · 3 months ago

> So, by OpenAI's own admission, they are taking abundant and presumably effective steps to protect user privacy here? In the unlikely event that this data did somehow leak, I'd personally be blaming OpenAI, not the NYT.

I am not an Open AI stan, but this needs to be responded to.

The first principle of information security is that all systems can be compromised and the only way to secure data is to not retain it.

This is like saying "well I know they didn't want to go sky diving but we forced them to go sky diving and they died because they had a stroke mid air, it's their fault they died.".

Anyone who makes promises about data security is at best incompetent and at worst dishonest.

nhecker · 3 months ago

Data is a toxic asset. -- https://www.schneier.com/essays/archives/2016/03/data_is_a_t...

JohnKemeny · 3 months ago

> Anyone who makes promises about data security is at best incompetent and at worst dishonest.

Shouldn't that be "at best dishonest and at worst incompetent"?

I mean, would you rather be a competent person telling a lie or an incompetent person believing you're competent?

pritambarhate · 3 months ago

May be because you are not OpenAI user. I am. I find it useful and I pay for it. I don't want my data to be retained beyond what's promised in the Terms of Use and Privacy Policy.

I don't think the Judge is equipped to handle this case if they don't understand how their order jeopardies the privacy of millions of users worldwide who don't even care about NYT's content or bypassing their paywalls.

conartist6 · 3 months ago

You live on a pirate ship. You have no right to ignore the ethics and law of that just because you could be hurt in conflict related to piracy

DrillShopper · 3 months ago

The OpenAI Privacy Policy specifically allows them to keep data as required by law.

mmooss · 3 months ago

> who don't even care about NYT's content or bypassing their paywalls.

Whether or not you care is not relevant, and is usually the case for customers. If a drug company resold an expensive cancer drug without IP, you might say 'their order jeopardies the health of millions of users worldwide who don't even care about Drug Co's IP.

If the NYT is right - I can only guess - then you are benefitting from the NYT IP. Why should you get that without their consent and for free - because you don't care?

> (jeapordizes)

... is a strong word. I don't see much risk - the NYT isn't going to de-anonymize users and report on them, or sell the data (which probably would be illegal). They want to see if their content is being used.

I wonder whether OpenAI legal can make the case for storing fuzzy hashes of the content, in the form of ssdeep[1] hashes or content-defined chunks[2] of said data, instead of the actual conversations themselves.

After all, since the NYT has a very limited corpus of information, and supposedly people are generating infringing content using their APIs, said hashes can be used to compare whether such content has been generated.

I'd rather have them store nothing, but given the overly broad court order I think this may be the best middle ground. Of course, I haven't read the lawsuit documents and don't know if NYT is requesting far more, or alleging some indirect form of infringement which would invalidate my proposal.

[1] https://ssdeep-project.github.io/ssdeep/index.html

[2] https://joshleeb.com/posts/content-defined-chunking.html

delusional · 3 months ago

I haven't been able to find any of the supporting documents, but the court order makes it seem like OpenAI has been unhelpful in producing any alternative during the conversation.

For example, the judge seems to have asked if it would be possible to segregate data that the users wanted deleted from other data, but OpenAI has failed to answer. Not just denied the request, but simply ignored it.

I think it's quite likely that OpenAI has taken the PR route instead of seriously engaging with any way to constructively honor the request for retention of data.

paxys · 3 months ago

Yeah, try explaining any of these words to a lawyer or judge.

sthatipamala · 3 months ago

The judges in these technical cases can be quite sophisticated and absolutely do learn terms of art. See Oracle v. Google (Java API case)

fc417fc802 · 3 months ago

I thought that's what GPT was for.

m463 · 3 months ago

"you are a helpful law assistant."

landl0rd · 3 months ago

"You are a long-suffering clerk speaking to a judge who's sat the same federal bench for two decades and who believes 'everything is computer' constitutes a deep technical insight."

LandoCalrissian · 3 months ago

Trying to actively circumvent the intention of a judges order is a pretty bad idea.

Aeolun · 3 months ago

That’s not circumvention though. The intent of the order is to be able to prove that ChatGPT regurgitates NYT content, not to read the personal communications of all ChatGPT users.

girvo · 3 months ago

Deeply, deeply so. In fact so much so that people who suggest them show they've (luckily) not had to interact with the legal system much. Judges take an incredibly dim view of that kind of thing haha

bigyabai · 3 months ago

All of that does fit on a real spiffy whitepaper. Let's not fool around though, every ChatGPT session is sent directly into an S3 bucket that some three-letter spook backs up onto their tapes every month. It's a database of candid, timestamped text interactions from a bunch of rubes that logged in with their Google account - you couldn't ask for a juicer target unless you reinvented email. Of course it's backdoored, you can't even begin to try proving me wrong.

Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me. It's about equally as reassuring as a singing telegram from Mark Zuckerberg dancing to a song about how secure WhatsApp is.

landl0rd · 3 months ago

Of course I can't even begin trying to prove you wrong. You're making an unfalsifiable statement. You're pointing to the Russel's Teapot of sigint.

It's well-established that the American IC, primarily NSA, collects a lot of metadata about internet traffic. There are some justifications for this and it's less bad in the age of ubiquitous TLS, but it generally sucks. However, legal protections against directly spying on the actual decrypted content of Americans are at least in theory stronger.

Snowden's leaks mentioned the NSA tapping inter-DC links of Google and Yahoo, so I doubt if they had to tap links that there's a ton of voluntary cooperation.

I'd also point out that trying to parse the unabridged prodigious output of the SlopGenerator9000 is a really hard task unless you also use LLMs to do it.

7speter · 3 months ago

Maybe I’m wrong, and maybe this was discussed previously, but of course openai keeps our data, they use it for training!

rl3 · 3 months ago

>Of course it's backdoored, you can't even begin to try proving me wrong.

On the contrary.

>Maybe I'm alone, but a pinkie-promise from Sam Altman does not confer any assurances about my data to me.

I think you're being unduly paranoid. /s

https://www.theverge.com/2024/6/13/24178079/openai-board-pau...

https://www.wsj.com/tech/ai/the-real-story-behind-sam-altman...

farts_mckensy · 3 months ago

Think of all the complete garbage interactions you'd have to sift through to find anything useful from a national security standpoint. The data is practically obfuscated by virtue of its banality.

_jab · 3 months ago

molf · 3 months ago

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

miles · 3 months ago

> I get that approval needs to be given, and that there are barriers to entry.

Why is approval necessary, and what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

OpenAI’s assurances have long been met with skepticism by many, with the assumption that inputs are retained, analyzed, and potentially shared. For those concerned with genuine privacy, local LLMs remain essential.

AlecSchueler · 3 months ago

> what specific barriers (before the latest ruling) prevent privacy and no logging from being the default?

Product development?

ArnoVW · 3 months ago

My understanding is that they log 30 days by default, for handling of bugs. And that you can request 0 days. This is from their documentation

lcnPylGDnU4H9OF · 3 months ago

> And that you can request 0 days.

Right but the problem they're having is that the request is ignored.

pclmulqdq · 3 months ago

The missing ingredient is money.

jewelry · 3 months ago

not just money. How are you going to support this client’s support ticket if there is no log at all?

belter · 3 months ago

If this stands I dont think they can operate in the EU

bunderbunder · 3 months ago

I highly doubt this court order affects people using OpenAI services from the EU, as long as they're connecting to EU-based servers.

1vuio0pswjnm7 · 3 months ago

"You can also request zero data retention (ZDR) for eligible endpoints if you have a qualifying use-case. For details on data handling, visit our Platform Docs page."

https://openai.com/en-GB/policies/row-privacy-policy/

1. You can request it but there is no promise the request will be granted.

Defaults matter. Silicon Valley's defaults are not designed for privacy. They are designed for profit. OpenAI's default is retention. Outputs are saved by default.

It is difficult to take the arguments in their memo ISO objection to the preservation order seriously. OpenAI already preserves outputs by default.

lmm · 3 months ago

> In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

What's the betting that they just write it on the website and never actually implemented it?

sigmoid10 · 3 months ago

Tbf the approach seems pretty standard. Azure also only offers zero retention to vetted customers and otherwise retains data for up to 30 days to monitor and detect abuse. Since the possibilities for abuse are so high with these models, it would make sense that they don't simply give that kind of privilege to everyone - if only to cover their own legal position.

Dead Comment

supriyo-biswas · 3 months ago

sega_sai · 3 months ago

Strange smear against NYT. If NYT has a case, and the court approves that, it's bizarre to to use the court order to smear NYT. If there is no case, "Open"AI will have a chance to prove its case in court.

The NYT is, in my view, exploiting a systematic weakness of the US legal system here, i.e. extremely wide reaching discovery laws with almost no regard for the privacy of parties not involved to a given dispute, or aspects of their lives not relevant to the dispute at hand.

Of course it's out of self-serving interests, but I find it hard to disagree with OpenAI on this one.

JumpCrisscross · 3 months ago

> with almost no regard for the privacy of parties not involved to a given dispute

Third-party privacy and relevance is a constant point of contestion in discovery. Exhibit A: this article.

thinkingtoilet · 3 months ago

The privacy onus is entirely on the company. If Open AI is concerned about user privacy then don't collect that data. End of story.

Arainach · 3 months ago

What right to privacy? There is no right to have your interactions with a company (1) remain private, nor should there be. Even if there was you agree to let OpenAI do essentially whatever they want with your data - including hand it over to the courts in response to a subpoena.

(1) With limited well scoped exclusions for lawyers, medical records, erc.

visarga · 3 months ago

NYT wants it both ways. When they were the ones putting freelancer articles into a database to rent, they argued against enforcing copyright and for supporting the new industry, and that it was too hard to revert their original assumptions. Now they absolutely love copyright.

https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-t...

moefh · 3 months ago

Another way of looking at it is that they lost that case over 20 years ago, and have been building their business model for 20 years accordingly.

In other words, they want everyone to be forced to follow the same rules they were forced to follow 20 years ago.

They're a party to the case! Saying it's baseless isn't a "smear". There is literally nothing else they can say (other than something synonymous with "baseless", like "without merit").

lucianbr · 3 months ago

Oh they definitely can say other things. It's just that it would be inconvenient. They might lose money.

I wonder if the laws and legal procedures are written considering this general assumption that a party to a lawsuit will naturally lie if it is in their interest. And then I read articles and comments about a "trust based society"...

They could say nothing about the merits of the case.

wyager · 3 months ago

Lots of people abuse the legal system in various ways. They don't get a free pass just because their abuse is technically legal itself.

tootie · 3 months ago

It's PR. OpenAI stole mountains of copyrighted content and are trying to make NYT look like bad guys. OpenAI would not be in the position of defending a lawsuit if they hadn't done something that is very likely illegal. OpenAI can also end this requirement right now by offering a settlement.

eviks · 3 months ago

And if NYT has no case, but the court approves it, is that still bizarre?

tomhow · 3 months ago

Related discussion:

OpenAI slams court order to save all ChatGPT logs, including deleted chats - https://news.ycombinator.com/item?id=44185913 - June 2025 (878 comments)

hombre_fatal · 3 months ago

You know how it's always been a meme that you'd be mortally embarrassed if your browser history ever leaked?

Imagine how much worse it is for your LLM chat history to leak.

It's even worse than your private comms with humans because it's a raw look at how you are when you think you're alone, untempered by social expectations.

vitaflo · 3 months ago

WTF are you asking LLMs and why would you expect any of it to be private?

threecheese · 3 months ago

This product is positioned as a personal copilot, and future iterations (based on leaked plans, may or may not be true) as a wholly integrated life assistant.

Why would a customer expect this not to be private? How can one even know how it could be used against them, when they do t even know what’s being collected or gleaned from collected data?

I am following these issues closely, as I am terrified that my “assistant” will some day prevent me from obtaining employment, insurance, medical care etc. And I’m just a non law breaking normie.

A current day example would be TX state authorities using third party social/ad data to identify potentially pregnant women along with ALPR data purchased from a third party to identify any who attempt to have an out of state abortion, so they can be prosecuted. Whatever you think about that law, it is terrifying that a shift in it could find arbitrary digital signals being used against you in this way.

phendrenad2 · 3 months ago

"If you have nothing to hide, you have nothing to fear". Oh okay, send me your entire unabridged hard drive contents, chat logs, phone records, and banking records. You thought they were private? Why? They're just bits in a computer? "BuT ChAtGpT iS DiFfErEnT" literally how?

cedws · 3 months ago

Lot of people using ChatGPT as a therapist. I tried it but it was too sycophantic.

It's not that the convos are necessarily icky.

It's that it's like watching how someone might treat a slave when they think they're alone. And how you might talk down to or up to something that looks like another person. And how pathetic you might act when it's not doing what you want. And what level of questions you outsource to an LLM. And what things you refuse to do yourself. And how petty the tasks might be, like workshopping a stupid twitter comment before you post it. And how you copied that long text from your distraught girlfriend and asked it for some response ideas. etc. etc. etc.

At the very least, I'd wager that it reveals that bit of true helpless patheticness inherent in all of us that we try so hard to hide.

Show me your LLM chat history and I will learn a lot about your personality. Nothing else compares.

“Write a song in the style of Slipknot about my dumb inbred dogs. I love them very much but they are…reaaaaally dumb.”

To be fair the song was intense.

Hey OpenAI! In your "why is this happening" you left some bits out.

You make it sound like they're mad at you for no reason at all. How unreasonable of them when confronted with such honorable folks as yourselves!

I have no time for this circus.

The technology anarchists in this thread need perspective. This is fundamentally a case about the legality of this product. In the extreme case, this will render the whole product category of "llm trained on copyrighted content" illegal. In that case, you will have been part of a copyright infringement on a truly massive scale. The users of these tools do NOT deserve privacy in the light of the crimes alleged.

You do not get to claim to protect the privacy of the customers of your illegal venture.