Dutch foundation takes down dataset illegally used for training AI
Copyright foundation Stichting BREIN has taken a Dutch dataset offline that was intended to train artificial intelligence (AI). It is a first for the Netherlands, the foundation said on Tuesday.
According to BREIN, the dataset was “enormous,” containing illegal copies of tens of thousands of books, millions of lines from news articles from websites like NU.nl, and subtitles of countless films and TV series from illegal sources. It was compressed to be easily used by AI computer models like large language models, the foundation said.
“We searched the dataset for the literal text: ‘Nothing from this publication may be reproduced,’ and this yielded more than 10,000 results. Each of these concerned illegally copied books,” BREIN director Bastiaan van Ramhorst said. “The news articles were also copied from websites with copyright reservations. This clearly shows that copyrights have not been respected. We call that a red-handed act.”
BREIN identified the person who made the dataset. They promised the foundation in writing not to use it anymore and told the foundation to who they provided the dataset. BREIN is investigating which AI models have used the dataset so that the parties can be held accountable.