>For example, our systems sometimes failed to flag violating content when the user asked Claude to translate from one language to another. Clio, however, spotted these conversations.
Why do they even consider translation of existing content "harmful", policy-wise? The content already exists. No machine translator I know would refuse translating something based on the content. That makes their language models unpredictable in one of their major use cases.
I'm adjacent to the world of sign language translators in the US. They are legally obligated to translate EVERYTHING, regardless of whether it's legal or not, and they also have to maintain client secrecy. I personally know some who have facilitated drug deals and another who has facilitated an illegal discussion about Trump.
We decided as a society that we're not going to use translation services to catch citizens in crime. This AI situation is so much milder--we're talking about censoring stuff that is "harmful", not illegal. The content is not being published by Anthropic--it's up to the users to publish it or not.
We seriously need regulations around AI "safety" because of the enormous influence they bear on all human discourse.
Presumably human interpreters aren't prone to hallucinating things when providing their services, right? That's probably one of the key differentiators.
I don't think I would describe a system in which a human ends up looking at your conversation if the algorithm thinks you're suspicious as "privacy-preserving". What is the non-privacy-preserving version of this system? A human browsing through every conversation?
I find this sort of thing cloying because all it does is show me they keep copies of my chats and access them at will.
I hate playing that card. I worked at Google, and for the first couple years, I was very earnest. Someone smart here pointed out to me, sure, maybe everything is behind 3 locks and keys and encrypted and audit logged, but what about the next guys?
Sort of stuck with me. I can't find a reason I'd ever build anything that did this, if only to make the world marginally easier to live in.
I thought this was true, honestly, up until I read it just now. User data is explicitly one of the 3 training sources[^1], with forced opt-ins like "feedback"[^2] lets them store & train on it for 10 years[^3], or tripping the safety classifier"[^2], lets them store & train on it for 7 years.[^3]
"Specifically, we train our models using data from three sources:...[3.] Data that our users or crowd workers provide"..."
[^2]
For all products, we retain inputs and outputs for up to 2 years and trust and safety classification scores for up to 7 years if you submit a prompt that is flagged by our trust and safety classifiers as violating our UP.
Where you have opted in or provided some affirmative consent (e.g., submitting feedback or bug reports), we retain data associated with that submission for 10 years.
[^3]
"We will not use your Inputs or Outputs to train our models, unless: (1) your conversations are flagged for Trust & Safety review (in which case we may use or analyze them to improve our ability to detect and enforce our Usage Policy, including training models for use by our Trust and Safety team, consistent with Anthropic’s safety mission), or (2) you’ve explicitly reported the materials to us (for example via our feedback mechanisms), or (3) by otherwise explicitly opting in to training."
This is a non starter for every company I work with as a B2B SaaS dealing with sensitive documents. This policy doesn’t make any sense. OpenAI is guilty of the same. Just freaking turn this off for business customers. They’re leaving money on the table by effectively removing themselves from a huge chunk of the market that can’t agree to this single clause.
Given the apparent technical difficulties involved in getting insight into a model’s underlying data, how would anyone ever hold them to account if they violated this policy? Real question, not a gotcha, it just seems like if corporate-backed IP holders are unable to prosecute claims against AI, it seems even more unlikely that individual paying customers would have greater success.
Even if this were true (and not hollowed out by various exceptions in Anthropic’s T&C), I would not call it “extremely strict”. How about zero retention?
They have to, the major AI companies are ads companies. Their profits demand that we accept their attempts to normalize the Spyware that networked AI represents.
Yep. More generally, I have a lot of distaste that big tech are the ones driving the privacy conversation. Why would you put the guys with such blatant ulterior motives behind the wheel? But, this seems to be the US way. Customer choice via market share above everything, always, even if that choice gradually erodes the customer's autonomy.
Not that anywhere else is brave enough to try otherwise, for fear of falling too far behind US markets.
Disclaimer: I could be much more informed on the relevant policies which enable this, but I can see the direction we're heading in... and I don't like it.
There’s absolutely nothing privacy preserving about their system and adding additional ways to extract and process user data doesn’t call for any additional privacy, it weakens it further.
Until they start using nvidia confidential compute and doing end to end encryption from the client to the GPU like we are, it’s just a larp. Sorry, a few words in a privacy policy don’t cut it.
Of course this doesn't need to be used on "AI use" as they frame it. So far, your activity was a line in the logs somewhere, now someone is actually looking at you with three eyes, at all times.
A lot of negativity in these comments. I find this analysis of claude.ai use cases helpful — many people, myself included, are trying to figure out what real people find LLMs useful for, and now we know a little more about that.
Coding use cases making up 23.8% of usage indicates that we're still quite early on the adoption curve. I wonder if ChatGPT's numbers also skew this heavily towards devs, who make up only ~2.5% of the [American] workforce.
While the highest catergoies are vague (web development vs cloud development) the specific clusters shown in the language specific examples expose a nation specific collectiev activity. While anonimized its stil exposing a lot of this collection of privat chats.
Good that the tell, but they did it before telling. I really hope they delete the detailed chats afterwards.
They should and probably wont delete the first layer of aggregation.
Why do they even consider translation of existing content "harmful", policy-wise? The content already exists. No machine translator I know would refuse translating something based on the content. That makes their language models unpredictable in one of their major use cases.
We decided as a society that we're not going to use translation services to catch citizens in crime. This AI situation is so much milder--we're talking about censoring stuff that is "harmful", not illegal. The content is not being published by Anthropic--it's up to the users to publish it or not.
We seriously need regulations around AI "safety" because of the enormous influence they bear on all human discourse.
I hate playing that card. I worked at Google, and for the first couple years, I was very earnest. Someone smart here pointed out to me, sure, maybe everything is behind 3 locks and keys and encrypted and audit logged, but what about the next guys?
Sort of stuck with me. I can't find a reason I'd ever build anything that did this, if only to make the world marginally easier to live in.
[^1] https://www.anthropic.com/legal/privacy:
"Specifically, we train our models using data from three sources:...[3.] Data that our users or crowd workers provide"..."
[^2] For all products, we retain inputs and outputs for up to 2 years and trust and safety classification scores for up to 7 years if you submit a prompt that is flagged by our trust and safety classifiers as violating our UP.
Where you have opted in or provided some affirmative consent (e.g., submitting feedback or bug reports), we retain data associated with that submission for 10 years.
[^3] "We will not use your Inputs or Outputs to train our models, unless: (1) your conversations are flagged for Trust & Safety review (in which case we may use or analyze them to improve our ability to detect and enforce our Usage Policy, including training models for use by our Trust and Safety team, consistent with Anthropic’s safety mission), or (2) you’ve explicitly reported the materials to us (for example via our feedback mechanisms), or (3) by otherwise explicitly opting in to training."
Not that anywhere else is brave enough to try otherwise, for fear of falling too far behind US markets.
Disclaimer: I could be much more informed on the relevant policies which enable this, but I can see the direction we're heading in... and I don't like it.
Until they start using nvidia confidential compute and doing end to end encryption from the client to the GPU like we are, it’s just a larp. Sorry, a few words in a privacy policy don’t cut it.
Palantir announced this even officially; partnership with Anthropic and AWS:
https://www.businesswire.com/news/home/20241107699415/en/Ant...
Coding use cases making up 23.8% of usage indicates that we're still quite early on the adoption curve. I wonder if ChatGPT's numbers also skew this heavily towards devs, who make up only ~2.5% of the [American] workforce.
Good that the tell, but they did it before telling. I really hope they delete the detailed chats afterwards. They should and probably wont delete the first layer of aggregation.