Sign In

Communications of the ACM

ACM TechNews

Google's PaLM-E Generalist Robot Brain Takes Commands

View as: Print Mobile App Share:
A robotic arm controlled by PaLM-E reaches for a bag of chips.

The PaLM-E multimodal embodied visual-language model can generate a plan of action for a mobile robot platform with an arm and execute the actions by itself.

Credit: Google Research

Researchers at Google and Germany's Technical University of Berlin debuted PaLM-E, described as the largest visual-language model (VLM) ever created.

The multimodal embodied VLM contains 562 billion parameters and combines vision and language for robotic control; Google claimed it can formulate a plan of action to execute high-level commands using its mobile robot platform equipped with an arm.

PaLM-E analyzes data from the robot's camera without requiring pre-processed scene representations, eliminating human data pre-processing or annotation.

The VLM's integration into the control loop also instills resistance to interruptions during tasks.

PaLM-E encodes continuous observations into a sequence of vectors identical in size to language tokens, so it can "understand" sensor data in the same way it processes language.

From Ars Technica
View Full Article


Abstracts Copyright © 2023 SmithBucklin, Washington, D.C., USA


No entries found

Sign In for Full Access
» Forgot Password? » Create an ACM Web Account