When I install an LTS version with a Universe package like ffmpeg, does everything continue getting security patches for the full five-year LTS life?
Or do I now need Ubuntu Pro to get the full five years?
When I install an LTS version with a Universe package like ffmpeg, does everything continue getting security patches for the full five-year LTS life?
Or do I now need Ubuntu Pro to get the full five years?
People keep sharing these kinds of conversations: the training cutoff date isn't some absolute date from which they never allowed any new data to be trained on.
Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
It's not going to reliably understand your new libraries, and more importantly if you convince it that it knows what happened in April 2023, it might start hallucinating extremely deeply: so deeply that conversation becomes useless until you edit it and remove the part where you convinced it of that.
It's not a question of whether they are "allowed" to train on new data; the question is whether they have trained it on data containing information about current events. If you know they've implemented a Continuous Integration (CI) system for this, you should link to a source. However, I don't think this is true, as there would be no reason for a cutoff date otherwise.
> Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
This seems more like an opinion of the technology's limitations in general, rather than an assessment of the likelihood that new information will be incorporated into its weights and biases.
Is it going to remain academic? I can easily imagine the spammy content farm / listicle business model evolving to be fully automated, creating an input loop.
It's also worth noting that when OpenAI created Whisper, they had to heuristically remove many transcripts from poor ASR systems, and they definitely didn't catch them all.
(1) Real content is not generated via a synthetic loop: Humans use generative AI in complex ways, intermixing human-generated and AI-generated content. Imagine a person who writes the first draft of an essay, then uses ChatGPT to rewrite parts of it. These are certainly many human additions, modifications, and stylistic flourishes.
(2) The most dramatic effects of model collapse were seen when training multiple generations of AI agents on content generated by the previous agent. This is a very academic scenario.
(3) There is already a lot of junk consumed by these models. RLHF is aimed at eliminating these junk responses. I am not aware of any research that explores how the full training cycle is affected when RLHF is employed.
Also, there is a lot of training material out there that was not used by the original GPT-3 model. The primary limitation is hardware.
I've been tempted to go back to Arch and I think this can be a good motivator.