Cutting-edge artificial intelligence systems can help you escape a parking fine, write an academic essay, or fool you into believing Pope Francis is a fashionista. But the virtual libraries behind this breathtaking technology are vast – and there are concerns they are operating in breach of personal data and copyright laws.
The enormous datasets used to train the latest generation of these AI systems, like those behind ChatGPT and Stable Diffusion, are likely to contain billions of images scraped from the internet, millions of pirated ebooks, the entire proceedings of 16 years of the European parliament and the whole of English-language Wikipedia.
But the industry's voracious appetite for big data is starting to cause problems, as regulators and courts around the world crack down on researchers hoovering up content without consent or notice. In response, AI labs are fighting to keep their datasets secret, or even daring regulators to push the issue.
From The Guardian (U.K.)
View Full Article
No entries found