Sign In

Communications of the ACM

ACM News

Artificial Intelligence Confronts a 'Reproducibility' Crisis

View as: Print Mobile App Share:
Facebook researchers said they found it "very difficult, if not impossible" to reproduce DeepMind's AlphaGo program.

Machine learning systems are black boxes even to the researchers that build them.

Credit: Getty Images

A few years ago, Joelle Pineau, a computer science professor at McGill, was helping her students design a new algorithm when they fell into a rut. Her lab studies reinforcement learning, a type of artificial intelligence that's used, among other things, to help virtual characters ("half cheetah" and "ant" are popular) teach themselves how to move about in virtual worlds. It's a prerequisite to building autonomous robots and cars. Pineau's students hoped to improve on another lab's system. But first they had to rebuild it, and their design, for reasons unknown, was falling short of its promised results. Until, that is, the students tried some "creative manipulations" that didn't appear in the other lab's paper.

Lo and behold, the system began performing as advertised. The lucky break was a symptom of a troubling trend, according to Pineau. Neural networks, the technique that's given us Go-mastering bots and text generators that craft classical Chinese poetry, are often called black boxes because of the mysteries of how they work. Getting them to perform well can be like an art, involving subtle tweaks that go unreported in publications. The networks also are growing larger and more complex, with huge data sets and massive computing arrays that make replicating and studying those models expensive, if not impossible for all but the best-funded labs.

"Is that even research anymore?" asks Anna Rogers, a machine-learning researcher at the University of Massachusetts. "It's not clear if you're demonstrating the superiority of your model or your budget."

Pineau is trying to change the standards. She's the reproducibility chair for NeurIPS, a premier artificial intelligence conference. Under her watch, the conference now asks researchers to submit a "reproducibility checklist" including items often omitted from papers, like the number of models trained before the "best" one was selected, the computing power used, and links to code and datasets. That's a change for a field where prestige rests on leaderboards—rankings that determine whose system is the "state of the art" for a particular task—and offers great incentive to gloss over the tribulations that led to those spectacular results.

The idea, Pineau says, is to encourage researchers to offer a road map for others to replicate their work. It's one thing to marvel at the eloquence of a new text generator or the "superhuman" agility of a videogame-playing bot. But even the most sophisticated researchers have little sense of how they work. Replicating those AI models is important not just for identifying new avenues of research, but also as a way to investigate algorithms as they augment, and in some cases supplant, human decision-making—everything from who stays in jail and for how long to who is approved for a mortgage.


From Wired
View Full Article



No entries found