Clearly, companies view the context fed to these tools as valuable. And it certainly has value in the abstract, as information about how they're being used or could be improved.
But is it really useful as training data? Sure, some new codebases might be fed in... but after that, the way context works and the way people are "vibe coding", 95% of the novelty being input is just the output of previous LLMs.
While the utility of synthetic data proves that context collapse is not inevitable, it does seem to be a real concern... and I can say definitively based on my own experience that the _median_ quality of LLM-generated code is much worse than the _median_ quality of human-generated code. Especially since this would include all the code that was rejected during the development process.
Without substantial post-processing to filter out the bad input code, I question how valuable the context from coding agents is for training data. Again, it's probably quite useful for other things.
It reminds me of ex-Soviet chess players. The emigration of so many good grandmaster-level players diluted the market, and unless you were in the absolute upper echelons (like Kramnik, Karpov, or Kasparov), you pretty much had to supplement your income by teaching on the side.
I'm usually on the side of empowering workers, but I believe sometimes the companies do have business saying this.
One reason is that much of the software industry has become a batpoop-insane slimefest of privacy (IP) invasion, as well as grossly negligent security.
Another reason is that the company may be held liable for license terms of the software.
Another reason is that the company may be held liable for illegal behavior of the software (e.g., if the software violates some IP of another party).
Every piece of software might expose the company to these risks. And maybe disproportionately so, if software is being introduced by the "I'm gettin' it done!" employee, rather than by someone who sees vetting for the risks as part of their job.
Yes, of course. You/that person may be the best & nicest on the planet, and/but we 'have decided' that 'China is the enemy and cannot be trusted'. So of course your CV will be discarded.
Also.. you pull something (criminal/damaging) off, where will they find you and keep you accountable? China will never extradite you to any country to be imprisoned. Is this a joke? Doesn't the person realize this at all? Is this person naive/5yo or just says shit for the clicks and the LOLs?
I think previously these sorts of offshore people were picked up by big bodyshop contractors, who could reliably place someone (and afford to have someone on the bench for a few weeks if needed) - since a massive bunch of government contracts were cancelled over the last few years this mode has dried up.
However, I have noticed that oftentimes devs are using queues where Workflow Engines would be a better fit.
If your message processing time is in tens of seconds – talk to your local Workflow Engine professional (:
In many cases, the conglomerates aren't even making money from them. How much do you think the movie company (and all the various middlemen) are making from some obscure movie from the 80s that they don't even make available on DVD or streaming anywhere? They're just griefing the public by withholding it and not even making any money.