Whether computers can actually "think" and "feel" is a question that has long fascinated society. Alan M. Turing introduced a test for gauging machine intelligence as early as 1950. Movies such as 2001: A Space Odyssey and Star Wars have only served to fuel these thoughts, but while the concept was once confined to science fiction, it is rapidly emerging as a serious topic of discussion.
Thanks to enormous advances in artificial intelligence (AI)—and particularly deep learning—computers can now converse with humans in seemingly realistic ways. In a few cases, the dialog has become so convincing that people have deemed machines sentient. A recent example involves former Google data scientist Blake Lemoine, who published human-to-machine discussions with an AI system called LaMDA.a
Lemoine's declaration received ample press attention—along with a robust backlash from the computer science and artificial intelligence communities. For example, Stanford University researcher John Etchemendy, co-director of the Stanford Institute for Human-Centered AI (HAI) wrote: "LaMDA is not sentient for the simple reason that it does not have the physiology to have sensations and feelings."b
Yet, the mere fact that a computer could convince people it is sentient is not something to dismiss. Computational linguistics is advancing at a furious rate and natural language AI is taking shape. Startups such as Google's LaMDAc and OpenAI's GPT-3d offer a glimpse at what is on the AI horizon. Already, machines are writing basic news articlese and tackling highly specialized chat functions. Future capabilities could revolutionize the way we go about our daily lives.
Avoiding the inevitable hype is critical. "There is this mad rush to claim that human-level AI is happening now," says Marjorie McShane, a professor of cognitive science at Rensselaer Polytechnic Institute. "Meaning and truly explainable AI still require a lot of serious work, with an uncertain payoff." Adds Julia Hirschberg, Percy K. and Vida L. W. Hudson Professor of Computer Science at Columbia University, "Although computers can generate human-like speech and text, there are still many issues to be resolved" to reach actual human-level interaction.
Machine consciousness is a topic that has long fascinated researchers, media, venture capitalists, and the public. The Turing Test—originally called The Imitation Game—may have been the first attempt to analyze natural speech interactions generated by machines, but the path to the present has been paved with a string of advancements that have periodically sparked questions about whether computers can actually think and feel.
For example, in 1964, Joseph Weizenbaum at the Massachusetts Institute of Technology's Artificial Intelligence Lab introduced a program called ELIZA. Using a then-sophisticated pattern matching system, it interacted with humans in a contextual manner, albeit superficially. The program used words to trigger scripts that loosely followed rules of conversation. While Weizenbaum introduced ELIZA to demonstrate the fallibility of machine-human interaction, some observers concluded that the system was sentient. In other words, they believed that it had human-like feelings.
That might seem absurd by today's standards. However, Lemoine reignited the long-simmering discussion when he published an interview with the natural language system LaMDA. The chatbot clearly demonstrated a convincing command of language. Lemoine then took things further by claiming the system was actually sentient. He explored the idea of seeking legal representation and personhood for LaMDA. Several months later, Google unceremoniously fired him.
"Today's systems are very entertaining to talk to, but they don't build any real model of the world," says Gary Marcus, co-author of Rebooting AI and founder and CEO of Geometric Intelligence, a firm acquired by Uber. Adds McShane, "ML systems can now generate text that, in many cases, sounds natural to people—except, of course, when they fail—sometimes with eye-popping incongruities."
The problem, McShane says, is that when faced with natural-sounding text, "People can't help but assume that the entity that created it understands it, but this couldn't be further from the truth." While deep learning and machine learning have advanced remarkably, and systems can spit out natural-sounding text, "Generating meaning-free language has nothing to do with actual intelligence in machines. Fluency is irrelevant when there's no meaning behind the text. These systems have no idea what they are talking about."
In fact, one group of researchers, including former Google AI scientist Timnit Gebru (now with Black in AI), labeled today's AI speech systems "stochastic parrots" in a 2021 academic paper.f In 2020, a separate team of researchers from the University of Washington noted, "Pretrained neural language models (LMs) are prone to generating racist, sexist, or otherwise toxic language, which hinders their safe deployment."g
"Sentience is an enormous logical leap," argues Stuart M. Shieber, James O. Welch Jr. and Virginia B. Welch Professor of Computer Science at Harvard University. "It doesn't follow that even if the Turing Test is a reasonable, if controversial, test for attributing intelligence, that it also serves as a reasonable test for attributing consciousness or sentience. People are deceived by the appearance of realistic behaviors."
Sentience aside, computer linguistics and natural speech are beginning to take root in the real world. Chatbots are getting better at mimicking human interaction. Autocomplete technology, used by Gmail and other applications, is changing the way people compose messages. Meanwhile, AI companies are introducing robots capable of carrying on full-fledged conversations with humans. One such firm, Embodied, offers a talking doll called Moxie that aims to promote social and emotional skill development for children between the ages of 5 and 10.
"Sentience is an enormous logical leap. People are deceived by the appearance of realistic behaviors."
OpenAI, an organization created by Elon Musk, Greg Brockman, and other heavyweights in the artificial intelligence field, is pushing the boundaries of natural speech. In 2020, OpenAI introduced Generative Pre-Trained Transformer 3, or GPT-3. The giant neural net's natural language skills are nothing short of remarkable; it can write articles and aid in software development, answer research questions using findings from academic papers, and handle a variety of text-generation and classification tasks. Popular language app Duolingo, for example, uses GPT-3 for French grammar corrections.
GPT-3, housed within a supercomputer complex in Iowa, was built using about 700 gigabytes of data originating from Wikipedia, digitized books, and websites. It includes approximately 285,000 CPU cores. Yet, while OpenAI boasts remarkable writing ability by machine standards—it can generate movie scripts and compose nonfiction in the style of famous authors—it remains a work in progress. At times it generates glaring errors and, when fed certain types of input, it is prone to spit out hollow prose.
Meanwhile, Google, DeepMind, and Meta all have developed their own language learning models (LLMs). Others, including Microsoft, Apple, Amazon, IBM, Nvidia, and Baidu also are continuing to advance machine linguistics and natural speech to address various requirements, whether the language skills are built into a digital assistant, a search engine or an interface to operate machinery.
Several methods exist for training natural language models, typically using convolutional neural networks (CNNs).
One common approach relies on so-called word vectors, which attempt to represent the meaning of a word mathematically, in order to highlight relationships between words and phrases.
Another approach, Recognizing Textual Entailment (RTE), classifies the relationships of words and sentences to each other through the lens of entailment vs. contradiction vs. neutrality. For example, the premise "A book has pages" entails "pages have text," but contradicts "words and images in a book move," while remaining neutral to a statement like "all books are good." As the system runs through millions of word combinations, it learns how to present and deliver statements accurately in context.
The goal, of course, is to develop systems that allow humans to interact with machines on human terms. Over the coming years, this could translate into new social media models for the metaverse and far more advanced sales and support functions. An AI system might, for example, provide highly interactive instructions for how to assemble furniture or manage a retirement account. Rather than reading cryptic instructions and struggling with basic YouTube videos, a person would simply ask for help and receive step-by-step contextual information including text, videos, and animations.
Of course, building systems capable of such real-world conversations is difficult, for a couple of reasons. First, there's the inherent complexity of language and the nearly infinite number of word combinations possible. No amount of training can prepare a system for every conceptual possibility. Second, computers lack any ability to discern red flags. A problem, Hirschberg points out, is that typical language samples fed into LLM systems lack "bad language" and "undesirable content."
In the real world, Hirschberg adds, this inherent complexity presents problems for "conversational agents and chatbots that must deal with a great many different topics appropriately, from shopping to travel to reservations to just answering user questions on a variety of topics."
Although computational linguistics capabilities almost certainly will advance by an order of magnitude in the coming years, what is not clear is how this will impact society. OpenAI has made it a point to focus on ethical AI, even publishing a paper featuring ways to battle the adverse use of its LLM.h Gebru and a group of other prominent researchers have established an organization named DAIR, which stands for Distributed Artificial Intelligence Research. Others, including Google, DeepMind, and Meta, are actively studying various aspects of ethics.
Yet major questions remain, and it is not clear whether well-intentioned AI ethical frameworks or even extremely accurate systems will solve the underlying problems. It is no secret that today's LLM models largely remain black boxes. There is virtually no understanding of how language models work. This makes it difficult to guarantee they will not cause damage or be commandeered for nefarious purposes—such as manipulating human behavior.
McShane and others believe one possible solution is developing agents that orient around meaning. Such agents extract the significance of language utterances; they reason, learn, and make decisions based on meaning, and they generate language based on the meaning they want to convey. So-called Language-Endowed Intelligent Agents (LEIAs}i rely on knowledge-based methods and cognitive modeling while incorporating the results of machine learning when and where it is applicable. A primary benefit of orienting a language system around meaning is that these agents will be able to explain their behavior in ways that people can understand.
Yet, even if total model transparency is achieved, there are no guarantees a system will work exactly as billed. Even if a broader set of people design and manage the technology—and government regulation is in place—things could still go haywire. "We're often seduced into thinking that even well-intentioned systems are good, but we really don't understand the full impact of the technology. While the performance of these systems has improved, our structured understanding of their behavior hasn't commensurately improved," Shieber says.
Concludes Marcus, "A lot of people want to think that more data will solve the problem, but that's what they have been telling us for a decade with driverless cars. So far, this approach hasn't worked, and it isn't clear that it will ever work. Both in language and driving, there is a constant stream of 'edge cases'—things that programmers didn't anticipate, and we're not equipped to deal with."
Bender, E.M., Gebru, T., McMillan-Major, A., and Shmitchell, S.
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, March 2021 Pages 610–623. https://doi.org/10.1145/3442188.3445922
There Can Be No Turing-Test—Passing Memorizing Machines, Michigan Publishing, June 2014, Volume 14, No. 16, pp. 1–13. https://dash.harvard.edu/handle/1/11684156
Gehman, S., Gururangan, S., Sap, M., Choi, Y., and Smith, N.A.
Real Toxicity Prompts: Evaluating Neural Toxic Degeneration in Language Models. https://aclanthology.org/2020.findings-emnlp.301.pdf
Linguistics for the Age of AI (MIT Press, 2022). https://direct.mit.edu/books/book/5042/Linguistics-for-the-Age-of-AI
a. Is LaMDA Sentient?—an Interview, https://bit.ly/3hxz8t1
b. Is Google's AI sentient? Stanford AI experts say that's 'pure clickbait', The Stanford Daily.
c. LaMDA: our breakthrough conversation technology, https://blog.google/technology/ai/lamda/
d. GPT-3 Powers the Next Generation of Apps, https://openai.com/blog/gpt-3-apps/
e. Robo-journalism: computer-generated stories may be inevitable, but it's not all bad news, https://bit.ly/3Fqu9T1
f. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big, https://dl.acm.org/doi/10.1145/3442188.3445922
g. Real Toxicity Prompts: Evaluating Neural Toxic Degeneration in Language Models," https://aclanthology.org/2020.findings-emnlp.301.pdf
h. Improving Language Model Behavior by Training on a Curated Dataset; https://openai.com/blog/improving-language-model-behavior/
i. Linguistics for the Age of AI (MIT Press, 2022); https://direct.mit.edu/books/book/5042/Linguistics-for-the-Age-of-AI
©2023 ACM 0001-0782/23/02
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from firstname.lastname@example.org or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.
No entries found