Stanford University researchers have developed OpinionQA to evaluate bias in language models, comparing the leanings of language models against public opinion polling.
OpinionQA uses three metrics of opinion alignment: representativeness, or how well the model is aligned with the general population and demographic cross sections; steerability, or how well the model can reflect a given subgroup's opinion when requested; and consistency, meaning the steadiness of the model's opinions across topics and time.
The researchers found language models generally are biased toward less-educated, lower-income, or conservative points of view when trained only on the Internet, but skew toward more-liberal, higher-educated, and higher-income points of view when refined through curated human feedback.
Stanford's Shibani Santurkar said OpinionQA "is helpful in identifying and quantizing where and how language models are misaligned with human opinion and how models often don't adequately represent certain subgroups."
From Stanford University Institute for Human-Centered AI
View Full Article
Abstracts Copyright © 2023 SmithBucklin, Washington, DC, USA
No entries found