Archive: https://archive.today/SqPCL
From the post:
>You can divide the recent history of LLM data scraping into a few phases. There was for years an experimental period, when ethical and legal considerations about where and how to acquire training data for hungry experimental models were treated as afterthoughts. Once apps like ChatGPT became popular and companies started commercializing models, the matter of training data became instantly and extremely contentious.
Archive: https://archive.today/SqPCL
From the post:
>>You can divide the recent history of LLM data scraping into a few phases. There was for years an experimental period, when ethical and legal considerations about where and how to acquire training data for hungry experimental models were treated as afterthoughts. Once apps like ChatGPT became popular and companies started commercializing models, the matter of training data became instantly and extremely contentious.
(post is archived)