Ask HN: GPT-3 reveals my full name – can I do anything?

  > I try to hide my real name whenever possible, out of an
  > abundance of caution. You can still find it if you search
  > carefully, but in today's hostile internet I see this kind
  > of soft pseudonymity as my digital personal space, and expect
  > to have it respected.

Without judging whether the goal is good or not, I will gently point out that your current approach doesn't seem to be effective. A Google search for "BoppreH" turned up several results on the first page with what appears to be your full name, along with other results linking to various emails that have been associated with that name. Results include Github commits, mailing list archives, and third-party code that cited your Github account as "work by $NAME".

As a purely practical matter -- again, not going into whether this is how things should be, merely how they do be -- it is futile to want the internet as a whole to have a concept of privacy, or to respect the concept of a "digital personal space". If your phone number or other PII has ever been associated with your identity, that association will be in place indefinitely and is probably available on multiple data broker sites.

The best way to be anonymous on the internet is to be anonymous, which means posting without any name or identifier at all. If that isn't practical, then using a non-meaningful pseudonym and not posting anything personally identifiable is recommended.

ChrisMarshallNY · 4 years ago

I gave up anonymity. I just learned to lean into taking control of my ID. Some time ago, I realized that there's no way for me to participate online, without things being attributed to me.

I learned this, by setting up a Disqus ID. I wanted to comment on a blog post, and started to set up an account.

After I started the process, it came back, with a list of random posts, from around the Internet (and some, very old), and said "Are these yours? If so, would you like to associate them with your account?"

I freaked. Many of them were outright troll comments (I was not always the haloed saint that you see before you) that I had sworn were done anonymously. They came from many different places (including DejaNews). I have no idea how Disqus found them.

Every single one of them was mine. Many, were ones that I had sworn were dead and buried in a deep grave in the mountains.

Needless to say, I do not have a Disqus ID.

Being non-anonymous means that I need to behave myself, online. I come across as a bit of a stuffy bore, but I suspect my IRL persona is that way, as well.

That's OK.

gentleman11 · 4 years ago

These are called “chilling effects,” they cause people to self censor when it comes to socially controversial positions. Historically, this would include womens suffrage, black rights, gay rights, various religious positions…

It’s not okay to be tracked so thoroughly that people stop feeling they can explore controversy online

ReactiveJelly · 4 years ago

That's okay, as long as there is no police state hunting you.

That's okay, as long as you aren't a member of any persecuted minority, and as long as you don't have any interesting political views to share.

sovnade · 4 years ago

Don't forget the other part - being non-anonymous online makes it easy for stalkers and other bad actors to take it to the extreme. We need anonymity for lots of reasons.

donkeyd · 4 years ago

Anonymity is a broad term though. I would be incredibly surprised (and fascinated) if anyone on HN can find my true identity from my account, even with an e-mail address in my profile. I also know that Google (and therefore powerful bad actors or law enforcement) can easily figure it out, since I've logged in to this e-mail address from the same devices as my private e-mail address.

If I'd need full privacy, I'd have to add many more levels of security in my daily life that I don't find necessary. I just don't want people (or a SWAT team) to show up at my door because I triggered someone on the internet. That's why I post from multiple different accounts on different platforms. Though, I'm sure, in the future some form of AI will be able to link them all based on writing style and similarity of content of my posts. Guess I'll have to find another way to remain somewhat anonymous then.

alcover · 4 years ago

  > some, very old
  > I had sworn were done anonymously

How in the hell did they do it ? I presume you changed IP and user-agent many times over since then... How ?

iratewizard · 4 years ago

Luckily for you, you're not the only Chris Marshall in NY. I personally know a physicist with the same name in the same state.

Deleted Comment

chaostheory · 4 years ago

You don’t need to give up privacy. You just have to pay for it. If you want search engine privacy, Optery offers it as a service.

https://www.optery.com/

It’s a YC company. My only affiliation is that I’m a customer.

I have a discount code if anyone is interested. I wasn’t sure if I could just paste it in the comments

pmichaud · 4 years ago

This is also my strategy for similar reasons. I don't ever want to be confused that the future is watching what I do online, so I post using my name.

birdyrooster · 4 years ago

Yeah for you and right now, it’s okay. Eventually something will happen to you where you will reevaluate your risk tolerance.

Deleted Comment

Dead Comment

mpeg · 4 years ago

Right? This whole thread feels like a joke when the author just removed their full name from their public, open source code 3 hours ago (and only from one of their repos, their name is fully visible in all the other LICENSE.txt files)

unreal37 · 4 years ago

Searching his "globally unique name" yields 4800 results.

Good luck with that.

Deleted Comment

Beldin · 4 years ago

This is victim blaming. Whether or not he could have been more careful is not an excuse for GPT-3. Illegal behaviour still should be (1) illegal even if the victim could have done more.

(1) I seem to remember a court case somewhere on the planet in the last months where lack of resistance was deemed indicative of consensual intercourse. Which is not even remotely acceptable. But I digress.

BoppreH · 4 years ago

It's one thing for someone to see my username on a gaming forum, search for it, find my github, pick a repo, click on the license, and find my name there. I'm ok with that, I feel like it's a high enough barrier for casual trolls and bots.

It's another different thing for my name to be auto-completed by the most popular, publicly available language model. That I'm less ok with, and I'm sure other people will find absolutely despicable.

We have GDPR and Right to Be Forgotten for a reason.

bebrws · 4 years ago

I believe in the following sentences very much. However, I believe the value of the internet for any person could possibly be directly correlated with the amount of PII they are willing to share which to me makes this, if, a question of morality, a personal decision.

The sentences that stuck out to me are: “If your phone number or other PII has ever been associated with your identity, that association will be in place indefinitely and is probably available on multiple data broker sites.

The best way to be anonymous on the internet is to be anonymous, which means posting without any name or identifier at all. If that isn't practical, then using a non-meaningful pseudonym and not posting anything personally identifiable is recommended.”

BoppreH · 4 years ago

> A Google search for "BoppreH" turned up several results on the first page

Not for me. It took until page 3 for just my first name to appear. If somebody is looking at past Github commits, that's already a high enough barrier for me.

I only partially agree with your conclusion. Asking people to maintain total anonymity always, with any slips punishable by permanent publication of that PII, might be the current status quo, but is not where we as society want to head.

dahart · 4 years ago

It seems strange to expect the internet to keep your privacy for you, if your PII has been leaked by you. Nobody else but you can know what you want done with your information, and people choose to post PII routinely, so it’s not possible to assume that when someone posts PII it’s actually private or an error. GPT-3 cannot be blamed for reciting things you can find in a Google search, and it doesn’t matter if the results are on page 1 or page 20. These days there usually are ways to fix leaky posts, if it is taken care of immediately, but not if you wait a few years. Either way, this doesn’t feel like clear enough thinking about what should and should not happen, nor about what society wants. I want control of my privacy, and if the internet were to scrub PII without my authorization, which seems like what you’re suggesting, that would not be control.

iso1631 · 4 years ago

The third result down is a repo which I assume is yours. Until 4 hours ago your name was in the LICENSE.TXT, and it's still the most recent change. You've also got your CV indexed on boppreh.com (and available in archive.org)

Another early result in DDG is a profile on deviantart, which you may not want linked to your professional identity (or maybe you do).

Your steam community page has a list of hundreds of games you own.

Fundamentally your problem isn't as much that your github account links to your name, it's that you use the same identifier across the web, one that isn't common like "neo", from "interesting" sites like deviantart to more normal ones like ubuntuforums.

You've removed your CV from your website, but it's still in internet archive. And do you really want your CV hidden? You've gota a good portfolio of work on the internet.

To me, the lack of separation of your names is far more of a challenge to your anonymity - especially when you call it out by posting something like this under that nome-de-plume. You have multiple aspects of your life that you can present in different ways, choosing a single unique nickname links those together, is that what you really want - even if your real name wasn't connected to it?

jmillikin · 4 years ago

  > Asking people to maintain total anonymity always, with any
  > slips punishable by permanent publication of that PII, might
  > be the current status quo, but is not where we as society
  > want to head.

‘Sea,’ cried Canute, ‘I command you to come no farther! Waves, stop your rolling, and do not dare to touch my feet!’

unreal37 · 4 years ago

I see your name in like the sixth Google result on page 1.

You can't "put the genie back in the bottle". It's out there, the Internet remembers forever.

araneae · 4 years ago

> The best way to be anonymous on the internet is to be anonymous, which means posting without any name or identifier at all. If that isn't practical, then using a non-meaningful pseudonym and not posting anything personally identifiable is recommended.

A third approach is using a word that means something and thus is not unique at all.

Unique strings for usernames means lots of accurate hits. If you google mine, there will be lots of hits but none are me.

brysonreece · 4 years ago

My general belief is that I, and others, should often treat the internet as a public forum like the local town square. Of course people can show up in a physical space, hiding their identities and screaming obscenities at bystanders, but I know I’m not that type of person. As a result, the principle I usually post things under is “conduct myself online as I would in person.”

Of course this doesn’t account for “the crazies” that could potentially harass me into my physical life at an easier rate simply because they’re mad I won an online game or the like. Thankfully I haven’t had to deal with such a situation, but I also believe that may be a consequence of avoiding inflammatory back-and-forths or highly-political discussions since anonymity is reduced, which may invite those attacks.

xwolfi · 4 years ago

Yes one of his mistake is to use the same username everywhere. He just needs a few links and he's burned.

It's better to use a username you copied from someone else also, like that if people find links, they find someone else entirely.

gtirloni · 4 years ago

> merely how they do be

Going on a tangent here but I've started seeing more "do be" used lately. However, it doesn't seem right for some reason I can't pinpoint (English is not my first language).

Is it from a dialect?

Jeaye · 4 years ago

https://en.wikipedia.org/wiki/Habitual_be

It's an African American idiom which has bled into Gen Z vernacular, from what I've seen.

chaostheory · 4 years ago

If you want search engine privacy, you can’t go wrong with YC’s Optery

https://www.optery.com/

I’m a satisfied customer

jrm4 · 4 years ago

The only way to fix this now is through collective, not individual, action. Policy, for example.

permo-w · 4 years ago

it certainly is not futile. it's futile to try and hide. what's not futile is to spray out false information that muddies the real stuff. if OP wants to obfuscate his real name, he can associate his username with 3 different false identities, a throwaway phone number, a false nationality, etc.

obviously it's a little paranoid and arrogant to assume that anyone cares enough to go through my comments, but occasionally, on websites like this and reddit, I will just outright lie about where I'm from, or what my age or gender or ethnicity or sexuality is

There is a legitimate question here. A lot of comments are trashing this post because his/her name is already all over the internet. But European laws have the 'right to be forgotten'. Aka you can write to Google and have your personal information removed, should you so wish. How might we address this with a GPT3 like model?

remram · 4 years ago

I feel like if OP had actually made an effort to hide this information from search engines and GPT-3 remained the last place from which it was available, this point would be a lot more compelling. Right now it's a "everybody has my name and that's fine, but that includes GPT-3 and that makes GPT-3 bad".

I would expect that it would take considerable effort to get this information removed from Google (you would have to write to them with a request under GDPR or similar and have them add a content filter) and I don't see why the same effort wouldn't allow you to get removed from GPT-3 (which is only accessible via a web API, so a similar filter could be added).

cortesoft · 4 years ago

I can never understand the ‘right to be forgotten’. How does that not conflict with another right, my ‘right to remember’?

hobofan · 4 years ago

It doesn't. It concerns companies and not you as a person. You can remember whatever you want. Companies are not allowed to do that anymore, as they've repeatedly shown that if they remember your data forever they (intentionally or not) do bad things with them.

smt88 · 4 years ago

"Right to be forgotten" is in the context of search engines, not human brains, physical newspapers, books, libraries, etc.

Imagine, for example, that you were falsely arrested for murder and then cleared of the crime.

It's very likely this would kill your career because employers Googling you would see the articles about your arrest.

In Europe, you would have a right to hide these articles from search engines.

jkrems · 4 years ago

Because people generally have elevated rights when it concerns themselves? E.g. I have the right not to be touched and it will (generally) outrank your right to touch me.

userbinator · 4 years ago

It's basically a "right to rewrite history", and I think we should strongly oppose such. History is immutable, it can only be appended to.

I'm not going to take this in a political direction, but make of that what you will.

nonameiguess · 4 years ago

There are two things you can do in cases like this.

The first is asking a website owner to delete data they collected on you. That doesn't really apply here. The places this person's name is published are his own website that has this username as its url, his own Github repos, and published papers of his that were also on his website. No GDPR request is necessary to remove his name from these places because he already owns that data. As seen, he has already started to delete it himself.

The second is asking search engines to delist a result. As far as I understand, this usually has to involve information that is otherwise meant to be scrubbed from public record, like a newspaper article about a conviction that was eventually sealed. You can't ask Google to not index a scientific journal you published to or your public Github repos.

There are, of course, limits to this thanks to public interest exceptions. I don't believe Prince Andrew can ask Google to de-index anything associating him with Jeffrey Epstein. The public has a right to know, too.

In this guy's case, he really seems to be straddling a line. He contributed to open source projects under his real name linking to a Github repo with the same username he seems to reuse everywhere, including here, and also has a website where the url is that username, and it contained his CV with his real name on it along with a publication history with every publication using his real name. Is it reasonable to do those things and then ask Google and OpenAI not to associate the username with your real name?

At what point are you some regular Joe with a real grievance and at what point are you Ian Murdock complaining that GPT knows you're the Ian associated with debian?

yreg · 4 years ago

GDPR is rather vague and perhaps it might be an intended feature.

They could:

1. Set up a content filter that filters op's name from the output. OpenAI would still need to keep record of the name, exposing it to leaks.

2. Remove the name from the dataset and retrain the model, which is obviously infeasible with each GDPR request.

I expect there are other instances where it is impractical or impossible to completely forget someone's data upon a request. Does Google send people spelunking into cold storage archives and actually destroy tapes (while migrating the data that is not supposed to be erased) every time they receive a request?

diamondage · 4 years ago

"obviously infeasible" is the interesting part. A) the law doesn't care if its infeasible or not. If someone actually challenges GPT3 on this, and GPT3 loses, then these kind of models are obliged to find a way to comply with the law, or stop what they are doing - technical difficulty is not much of a defense. Also B) I think that there is probably a way to do this with either clever training data or algorithmics, which doesn't require retraining of the whole model. We need a precise theory to explain what these models are actually doing anyway. There are so many applications where we need more than a vague or probabilistic response.

jesboat · 4 years ago

Most likely, they don't keep any backups with user data longer than a short threshold, e.g 60 days. This is pretty common practice.