PaLM-E
Google · March 2023
● activeClosedmultimodal
Parameters562B
Why It Matters
Demonstrated that scaling up embodied language models enables transfer of knowledge across different robot embodiments and tasks, showing positive transfer from web-scale language and vision data.
Description
Google's 562-billion-parameter embodied multimodal model that combines PaLM's language understanding with visual and sensor inputs for robotic planning. The largest vision-language model at the time of release, capable of understanding scenes and generating plans for robots to execute.
Key Innovations
Multimodal
MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.
robotics
embodied