This feels like another area where piracy will surely be superior in case things like this land on the disallowed side of regulation. The model trained on all data will outperform the model trained on a legal subset of data. Whether or not you use it to produce potentially infringing content is another point. Performance will likely improve from having references to copyrighted material and people capable of doing so, myself included, would probably prefer to interact with the non limited model. Perhaps time to update the laws or at least move liability from the creator of the model to the user. No one is going after pencil makers but I can draw a pretty good Mickey Mouse with access to one. Feels like me generating C3P0 and claiming ownership is my problem, not OpenAIs.