Cases like AI image generators are obvious. It doesn’t matter what the AI was trained with. What matters is that anything you publish must be different enough from previously copyrighted works to not infringe on their copyrights.
It gets tricky when the AI itself is what people are paying for and its output is used privately by those customers. That means anytime the AI spits out some text that is nearly identical to part of a Reuters article it is violating Reuters’ copyright, therefore any AI where people pay to see the output cannot be trained on copyrighted work without permission.
That covers all of the major LLMs (ChatGPT, Grok, Gemini). Clegg is right about those. They cannot exist without violating copyrights.
I am waiting for open source software projects with restrictive licenses to start taking down companies who copied their code using an LLM coding assistance tool.
(post is archived)