While the large language models (LLMs) that power ChatGPT and Google Bard were trained on data from the open web, DarkBERT was trained exclusively on data from the dark web. Yes, you read that correctly, this new AI model was trained using data from hackers, cybercriminals and other scammers.
A team of South Korean researchers have released a paper (PDF opens in new tab) detailing how they made DarkBERT using data from the Tor network, which is often used to access the dark web. By crawling through the dark web and then filtering the raw data, they were able to create a dark web database that they used to train DarkBERT.
From Tom's Guide
View Full Article
No entries found