Sign In

Communications of the ACM

ACM TechNews

Synthetic Data Could Be Better Than Real Data

View as: Print Mobile App Share:

Machine-generated data sets have the potential to improve privacy and representation in artificial intelligence, if researchers can find the right balance between accuracy and fakery.

Credit: Janelle Barone

Some researchers envision synthetic data as not only offering content that is close enough to actual data to preserve privacy, but also enabling production of better data.

Synthetic data generation involves a computer analyzing real datasets to infer their statistical relationships, then creating a new dataset with different data points but the same relationships.

Advocates claim synthetic data can circumvent issues like production and maintenance costs, little real-world data available for training, and social and other biases by adding missing information to datasets faster and more affordably than real-world collection.

Thomas Strohmer at the University of California, Davis believes synthetic data could democratize artificial intelligence research by addressing the imbalance caused by a few large companies owning a great deal of data.

From Nature
View Full Article


Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account