The story that was going around on social media (which I only know because Claude refused to translate it sometimes) was that a particular developer was modifying weights in other developers models and crashing their training runs so that the developers own work looked better in comparison.
how is this any different from starting new projects at google and leaving them in a half-baked state because that leads to a promotion faster? incentives align behavior
What we often think of as Insider Threat in the west is just another Tuesday in Chinese business. I have many experiences of this in the video game industry. This industry sabotage and theft is a very real part of getting ahead, even amongst companies that are owned by the same parent company (ex: studios owned in part by Tencent).
10,000 people is as many people as some entire towns, I don't think society would hold together very long if it were true.
100,000 supposes that there are... hmm... about eighty thousand non-evil people in the world, and (odds are) exactly none of them are Marshallese and about 2 are Samoan, to give a sense of how silly this is.
There is probably a high percentage of tearing down, I doubt its so extreme.
I think maybe 1 in 100k is actually anything special, but odds are you aren't special, you just noticed that 20% of the population is as gifted/motivated/constructive as you are (statistically speaking, assuming a bell curve).
And of those, yes, some small percentage will still feel "special" and affronted that other people have the same ideas/goals/desires as them.
The world does not work like that. Sure, for every person, there may be 100_000 that do not share their ideals. But even 10/100_000 would seem ridiculously high as a percentage of people who actively try to destroy and cannibalize the work of others to showcase their own. Another commenter said it here - it's easier to destroy than create. I guess by my vibe-based estimates, it's at least 1_000 times easier to destroy than create, in aggregate.
I'm reminded of a time that an intern took down us-east1 on AWS, by modifying a configuration file they shouldn't have had access to. Amazon (somehow) did the correct thing and didn't fire them -- instead, they used the experience to fix the security hole. It was a file they shouldn't have had access to in the first place.
If the intern "had no experience with the AI lab", is it the right thing to do to fire them, instead of admitting that there is a security/access fault internally? Can other employees (intentionally, or unintentionally) cause that same amount of "damage"?
From what I've seen in Amazon it's pretty consistent that they do not blame the messenger which is what they consider the person who messed up. Usually that person is the last in a long series of decisions that could have prevented the issue, and thus why blame them. That is unless the person is a) acting with malice, b) is repeatedly shown a pattern of willful ignorance. IIRC, when one person took down S3 with a manual command overriding the safeguards the action was not to fire them but to figure out why it was still a manual process without sign off. Say what you will about Amazon culture, the ability to make mistakes or call them out is pretty consistently protected.
> when one person took down S3 with a manual command overriding the safeguards
It didn't override safeguards, but they sure wanted you to think that something unusual was done as part of the incident. What they executed was a standard operational command. The problem was, the components that that command interacted with had been creaking at the edges for years by that point. It was literally a case of "when", and not "if". All that happened was the command tipped it over the edge in combination with everything else happening as part of normal operational state.
Engineering leadership had repeatedly raised the risk with further up the chain and no one was willing to put headcount to actually mitigating the problem. If blame was to be applied anywhere, it wasn't on the engineer following the run book that gave them a standard operational command to execute with standard values. They did exactly what they were supposed to.
Some credit where it's due, my understanding from folks I knew still in that space, is that S3 leadership started turning things around after that incident and started taking these risks and operational state seriously.
> From what I've seen in Amazon it's pretty consistent that they do not blame the messenger which is what they consider the person who messed up
Interesting that my experience has been the exact opposite.
Whenever I’ve participated in COE discussions (incident analysis), questions have been focused on highlighting who made the mistake or who didn’t take the right precautions.
Precisely, if you ship if, you own it. So ownership isn’t the individual but rather the team and company. Blaming a human for an error that at least another engineer likely code reviewed, a team probably discussed prioritizing and eventually lead to degradation is a poor way to prevent it from happening again.
There is a huge difference between someone making a mistake and someone intentionally sabotaging.
You're not firing the person because they broke stuff, you are firing them because they tried to break stuff. If the attempt was a failure and caused no harm, you would still fire them. Its not about the damage they caused its that they wanted to cause damage.
But for damaging company assets on purpose firing is only first step.
I do not see any mention of other legal action and article is shallow.
It might’ve been that someone in command chain called it “malicious” to cover up his own mistakes. I think that is parent poster point while writing out Amazon story.
I worked at AWS for 13 years. I did “aws call leader” for 7 years, and worked in the reliability org when we rebuilt the coe tool. Ive personally blown up a service or two, and know other PEs whove done the same or larger.
Ive never heard of an individual being terminated or meaningfully punished for making an earnest mistake, regardless of impact. I do know of people who were rapid term’d for malicious, or similar, actions like sharing internal information or (attempting to) subvert security controls.
On the whole I did see Amazon “do the right thing” around improving process and tools; people are a fallible _part_ of a system, accountability requires authority, incremental improvements today over a hypothetical tomorrow.
PAM debacle (17Q4) in Device Econ is a counter example.
And that wasn’t even a mistake the SDEs made — they were punished for the economists being reckless and subsequently bullied out of the company, despite the SDEs trying to raise the alarm the whole time.
I think this is an important distinction and the answer is that it is hard to distinguish. People often bring up the Simple Sabotage Field Manual in situations like these and I think there's something that is often missed: the reason the techniques in here are effective is because they are difficult to differentiate from normal behavior. This creates plausible deniability for the saboteur. Acting too hastily could mean losing someone valuable for a genuine mistake. I'm saying I agree with the Amazon example. (You can also use saboteurs to your advantage if you recognize that they are hunting down and exploiting inefficiencies, but that's a whole other conversation)
But my understanding of this case is that the actions do not appear like simple easy to make mistakes. As I understand, the claim was that the intern was modifying the weights of checkpoints for other peoples' training results in an effort to make their own work better. Mucking about in a checkpoint is not a very common thing to do, so should make someone suspicious in the first place. On top of this it appears he was exploiting weaknesses and injecting code to mess with peoples' optimizers, and to do things that do not have a reasonable explanation for.
So as far as I can tell, not only was he touching files he shouldn't have been touching (and yes, shouldn't have had access to), he was taking steps to bypass the blocks there were in place and was messing with them in ways that are very difficult to explain away with "I thought this might be a good idea." (Things that explicitly look like a bad idea). If that is what in fact happened, I think it is not a reach to claim intentional sabotage. Because if it wasn't, then the actions are represent such a level of incompetence that they are a huge liability to anyone within reach.
It was one of the STEP interns that took down Google prod by modifying some config file by putting something erroneous into an automated tool. Everyone at the company was locked out, and someone had to physically access some machines in a datacenter to recover.
Malicious intent to be precise. Well-intentioned attempts to demonstrate issues for the purposes of helping to fix should generally not be punished, unless there is a wider fallout than expected and that can be attributed to negligence.
> If the intern "had no experience with the AI lab", is it the right thing to do to fire them, instead of admitting that there is a security/access fault internally?
This wasn’t an accident, though. The intern had malicious intent and was intentionally trying to undermine other people’s work.
This isn’t a case where blameless post-mortems apply. When someone is deliberately sabotaging other people’s work, they must be evicted from the company.
afaik this was intentional in that they stopped training runs and changing parameters for other employee training runs, and even joined in on the debugging group trying to solve the "issues".
It's a Chinese company, saving face is far more important for them than "teaching lessons" to anyone, particularly employees who are probably considered expendable.
I always laugh when I see these predictable comments about "face" when talking about Asian companies, like they are so beholden to their culture they can't make individual judgments.
I wonder if we applied this culture talk to Western companies how funny it would sound.
The reason Facebook is firing so many people is because individualism "is far more important for them than 'teaching lessons' to anyone, particularly employees who are probably considered expendable."
For better or worse, when you have more time to learn how the real world works and make the right connections with the right people, you get much more leeway in what you can get away with.
Naturally, older people had more time to do that than younger people. This is why most young people get their shins blasted while older people just get a slap on the wrist, if they're found out.
It can give you the experience to know how careful you need to be in doing that, if only because you've lived long enough to see many be scuppered because of their failure to do so well enough.
The “reputation washing” behavior of Tian Keyu has been extremely harmful
For the past two months, Tian Keyu has maliciously attacked the cluster code, causing significant harm to nearly 30 employees of various levels, wasting nearly a quarter’s worth of work by his colleagues. All records and audits clearly confirm these undeniable facts:
1. Modified the PyTorch source code of the cluster, including random seeds, optimizers, and data loaders.
3. Opened login backdoors through checkpoints, automatically initiating random process terminations.
4. Participated in daily troubleshooting meetings for cluster faults, continuing to modify attack codes based on colleagues’ troubleshooting ideas.
5. Altered colleagues’ model weights, rendering experimental results unreproducible.
It’s unimaginable how Tian Keyu could continue his attacks with such malice, seeing colleagues’ experiments inexplicably interrupted or fail, after hearing their debugging strategies and specifically modifying the attack codes in response, and witnessing colleagues working overnight with no progress. After being dismissed by the company, he received no penalties from the school or advisors and even began to whitewash his actions on various social media platforms. Is this the school and advisors’ tolerance of Tian Keyu’s behavior? We expect this evidence disclosure to attract the attention of relevant parties and for definitive penalties to be imposed on Tian Keyu, reflecting the social responsibility of higher education institutions to educate and nurture.
We cannot allow someone who has committed such serious offenses to continue evading justice, even beginning to distort facts and whitewash his wrongdoing! Therefore, we decide to stand on behalf of all justice advocates and reveal the evidence of Tian Keyu’s malicious cluster attack!
Tian Keyu, if you deny any part of these malicious attack behaviors, or think the content here smears you, please present credible evidence! We are willing to disclose more evidence as the situation develops, along with your shameless ongoing attempts to whitewash. We guarantee the authenticity and accuracy of all evidence and are legally responsible for the content of the evidence. If necessary, we are willing to disclose our identities and confront Tian Keyu face-to-face.
Thanks to those justice advocates, you do not need to apologize; you are heroes who dare to speak out.
Clarification Regarding the “Intern Sabotaging Large Model Training” Incident
Recently, some media reported that “ByteDance’s large model training was attacked by an intern.” After internal verification by the company, it was confirmed that an intern from the commercial technology team committed a serious disciplinary violation and has been dismissed. However, the related reports also contain some exaggerations and inaccuracies, which are clarified as follows:
1. The intern involved maliciously interfered with the model training tasks of the commercial technology team’s research project, but this did not affect the official commercial projects or online operations, nor did it involve ByteDance’s large model or other businesses.
2. Rumors on the internet about “involving over 8,000 cards and losses of millions of dollars” are greatly exaggerated.
3. Upon verification, it was confirmed that the individual in question had been interning in the commercial technology team, and had no experience interning at AI Lab. Their social media bio and some media reports are incorrect.
The intern was dismissed by the company in August. The company has also reported their behavior to the industry alliance and the school they attend, leaving further actions to be handled by the school.
(via https://news.ycombinator.com/item?id=41906970, but we merged that thread hither)
https://twitter.com/YouJiacheng/status/1847420973580243092
https://www.cbsnews.com/news/xu-yao-death-sentence-poisoning...
Deleted Comment
100,000 supposes that there are... hmm... about eighty thousand non-evil people in the world, and (odds are) exactly none of them are Marshallese and about 2 are Samoan, to give a sense of how silly this is.
I think maybe 1 in 100k is actually anything special, but odds are you aren't special, you just noticed that 20% of the population is as gifted/motivated/constructive as you are (statistically speaking, assuming a bell curve).
And of those, yes, some small percentage will still feel "special" and affronted that other people have the same ideas/goals/desires as them.
It's a rat race and it's not your fault.
Dead Comment
If the intern "had no experience with the AI lab", is it the right thing to do to fire them, instead of admitting that there is a security/access fault internally? Can other employees (intentionally, or unintentionally) cause that same amount of "damage"?
It didn't override safeguards, but they sure wanted you to think that something unusual was done as part of the incident. What they executed was a standard operational command. The problem was, the components that that command interacted with had been creaking at the edges for years by that point. It was literally a case of "when", and not "if". All that happened was the command tipped it over the edge in combination with everything else happening as part of normal operational state.
Engineering leadership had repeatedly raised the risk with further up the chain and no one was willing to put headcount to actually mitigating the problem. If blame was to be applied anywhere, it wasn't on the engineer following the run book that gave them a standard operational command to execute with standard values. They did exactly what they were supposed to.
Some credit where it's due, my understanding from folks I knew still in that space, is that S3 leadership started turning things around after that incident and started taking these risks and operational state seriously.
Interesting that my experience has been the exact opposite.
Whenever I’ve participated in COE discussions (incident analysis), questions have been focused on highlighting who made the mistake or who didn’t take the right precautions.
You're not firing the person because they broke stuff, you are firing them because they tried to break stuff. If the attempt was a failure and caused no harm, you would still fire them. Its not about the damage they caused its that they wanted to cause damage.
Dead Comment
I do not see any mention of other legal action and article is shallow.
It might’ve been that someone in command chain called it “malicious” to cover up his own mistakes. I think that is parent poster point while writing out Amazon story.
Ive never heard of an individual being terminated or meaningfully punished for making an earnest mistake, regardless of impact. I do know of people who were rapid term’d for malicious, or similar, actions like sharing internal information or (attempting to) subvert security controls.
On the whole I did see Amazon “do the right thing” around improving process and tools; people are a fallible _part_ of a system, accountability requires authority, incremental improvements today over a hypothetical tomorrow.
And that wasn’t even a mistake the SDEs made — they were punished for the economists being reckless and subsequently bullied out of the company, despite the SDEs trying to raise the alarm the whole time.
But my understanding of this case is that the actions do not appear like simple easy to make mistakes. As I understand, the claim was that the intern was modifying the weights of checkpoints for other peoples' training results in an effort to make their own work better. Mucking about in a checkpoint is not a very common thing to do, so should make someone suspicious in the first place. On top of this it appears he was exploiting weaknesses and injecting code to mess with peoples' optimizers, and to do things that do not have a reasonable explanation for.
So as far as I can tell, not only was he touching files he shouldn't have been touching (and yes, shouldn't have had access to), he was taking steps to bypass the blocks there were in place and was messing with them in ways that are very difficult to explain away with "I thought this might be a good idea." (Things that explicitly look like a bad idea). If that is what in fact happened, I think it is not a reach to claim intentional sabotage. Because if it wasn't, then the actions are represent such a level of incompetence that they are a huge liability to anyone within reach.
[0] https://www.cia.gov/static/5c875f3ec660e092cf893f60b4a288df/...
Did the employee have the intent to cause damage? If so just fire him/her.
Deleted Comment
This wasn’t an accident, though. The intern had malicious intent and was intentionally trying to undermine other people’s work.
This isn’t a case where blameless post-mortems apply. When someone is deliberately sabotaging other people’s work, they must be evicted from the company.
I wonder if we applied this culture talk to Western companies how funny it would sound.
The reason Facebook is firing so many people is because individualism "is far more important for them than 'teaching lessons' to anyone, particularly employees who are probably considered expendable."
Deleted Comment
Naturally, older people had more time to do that than younger people. This is why most young people get their shins blasted while older people just get a slap on the wrist, if they're found out.
Deleted Comment
Deleted Comment
Deleted Comment
Deleted Comment
Deleted Comment
Summary:
10/18:
Translation of the provided text:
Title: Urgent Warning
The “reputation washing” behavior of Tian Keyu has been extremely harmful
For the past two months, Tian Keyu has maliciously attacked the cluster code, causing significant harm to nearly 30 employees of various levels, wasting nearly a quarter’s worth of work by his colleagues. All records and audits clearly confirm these undeniable facts:
1. Modified the PyTorch source code of the cluster, including random seeds, optimizers, and data loaders.
2. Randomly killed multi-machine experiment processes, causing significant experiment delays.
3. Opened login backdoors through checkpoints, automatically initiating random process terminations.
4. Participated in daily troubleshooting meetings for cluster faults, continuing to modify attack codes based on colleagues’ troubleshooting ideas.
5. Altered colleagues’ model weights, rendering experimental results unreproducible.
It’s unimaginable how Tian Keyu could continue his attacks with such malice, seeing colleagues’ experiments inexplicably interrupted or fail, after hearing their debugging strategies and specifically modifying the attack codes in response, and witnessing colleagues working overnight with no progress. After being dismissed by the company, he received no penalties from the school or advisors and even began to whitewash his actions on various social media platforms. Is this the school and advisors’ tolerance of Tian Keyu’s behavior? We expect this evidence disclosure to attract the attention of relevant parties and for definitive penalties to be imposed on Tian Keyu, reflecting the social responsibility of higher education institutions to educate and nurture.
We cannot allow someone who has committed such serious offenses to continue evading justice, even beginning to distort facts and whitewash his wrongdoing! Therefore, we decide to stand on behalf of all justice advocates and reveal the evidence of Tian Keyu’s malicious cluster attack!
Tian Keyu, if you deny any part of these malicious attack behaviors, or think the content here smears you, please present credible evidence! We are willing to disclose more evidence as the situation develops, along with your shameless ongoing attempts to whitewash. We guarantee the authenticity and accuracy of all evidence and are legally responsible for the content of the evidence. If necessary, we are willing to disclose our identities and confront Tian Keyu face-to-face.
Thanks to those justice advocates, you do not need to apologize; you are heroes who dare to speak out.
Link to the inquiry recording of Tian Keyu: https://www.youtube.com/watch?v=nEYbYW--qN8
Personal homepage of Tian Keyu: https://scholar.google.com/citations?user=6FdkbygAAAAJ&hl=en
GitHub homepage of Tian Keyu: https://github.com/keyu-tian
10/19:
Clarification Regarding the “Intern Sabotaging Large Model Training” Incident
Recently, some media reported that “ByteDance’s large model training was attacked by an intern.” After internal verification by the company, it was confirmed that an intern from the commercial technology team committed a serious disciplinary violation and has been dismissed. However, the related reports also contain some exaggerations and inaccuracies, which are clarified as follows:
1. The intern involved maliciously interfered with the model training tasks of the commercial technology team’s research project, but this did not affect the official commercial projects or online operations, nor did it involve ByteDance’s large model or other businesses.
2. Rumors on the internet about “involving over 8,000 cards and losses of millions of dollars” are greatly exaggerated.
3. Upon verification, it was confirmed that the individual in question had been interning in the commercial technology team, and had no experience interning at AI Lab. Their social media bio and some media reports are incorrect.
The intern was dismissed by the company in August. The company has also reported their behavior to the industry alliance and the school they attend, leaving further actions to be handled by the school.
> ByteDance also denied reports that the incident caused more than $10m of damage
It makes clear what ByteDance's official position is, while pretty clearly hinting that it might not be true.