PaLM-E

Google · March 6, 2023

● activeClosedencoder decodermultimodal

Parameters562B

Context WindowN/A tokens

Description

Landmark embodied multimodal language-vision model that directly controls robotic actuators by projecting sensor inputs into language embedding space.

Key Innovations

Multimodal

MultimodalProcessing multiple types of input (text, images, audio, video) in a single model.

robotics

embodied

Family Tree

Built On

PaLM

Lineage

PaLM→PaLM-E

Successors (1)

RT-2

More from Robotics / Embodied

RT-22023-07-28 · 55B

NextRT-2