Despite encouraging progress in embodied learning over the past two decades, there is still a large gap between embodied agents' perception and human perception. Humans have remarkable capabilities combining all our multisensory inputs. To close the gap, embodied agents should also be enabled to see, hear, touch, and interact with their surroundings in order to select the appropriate actions. However, today's learning algorithms primarily operate on a single modality. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals jointly. The goal of this workshop is to share recent progress and discuss current challenges on embodied learning with multiple modalities.

The EML workshop will bring together researchers in different subareas of embodied multimodal learning including computer vision, robotics, machine learning, natural language processing, and cognitive science to examine the challenges and opportunities emerging from the design of embodied agents that unify their multisensory inputs. We will review the current state and identify the research infrastructure needed to enable a stronger collaboration between researchers working on different modalities.

Call for Papers

We invite submissions of 2-4 pages extended abstracts in topics related to (but not limited to):
  • audio-visual embodied learning
  • touch sensing and embodied learning
  • language and embodied learning
  • speech and embodied learning
  • self-supervised/semi-supervised learning with multiple modalities
  • multimodal reinforcement learning
  • meta-learning with multiple modalities
  • novel multimodal datasets/simulators/tasks for embodied agents
  • combining multisensory inputs for robot perception
  • bio-inspired approaches for multimodal perception
A submission should take the form of an extended abstract (2-4 pages long excluding references) in PDF format using the ICLR style. We will accept submissions of (1) papers that have not been previously published or accepted for publication in substantially similar form; (2) papers that have been published or accepted for publication in recent venues including journal, conference, workshop, and arXiv; and (3) research proposals for future work with a focus on well-defined concepts and ideas. All submissions will be reviewed with single blind policy. Accepted extended abstracts will not appear in ICLR proceedings, and hence will not affect future publication of the work. We will publish all accepted extended abstracts on the workshop webpage.

CMT submissions website: https://cmt3.research.microsoft.com/EML2021

Key Dates:

  • Extended abstract submission deadline: March 5th, 2021 (11:59 PM PST)
  • Late submission deadline: March 22nd, 2021 (11:59 PM PST)
  • Notification to authors: March 26th, 2021
  • Workshop date: May 7th, 2021

Program Committee:

Unnat Jain (UIUC), Michelle Lee (Stanford), Paul Pu Liang (CMU), Senthil Purushwalkam (CMU), Santhosh Kumar Ramakrishnan (UT Austin), Mohit Shridhar (UW), Tianmin Shu (MIT), Shaoxiong Wang (MIT)


07:55 am - 08:00 am (PDT) Introduction and Opening Remarks
08:00 am - 08:30 am (PDT) Invited Talk Katherine Kuchenbecker
08:30 am - 09:00 am (PDT) Invited Talk Danica Kragic
09:00 am - 09:30 am (PDT) Paper Session A A1 - A5
09:30 am - 09:40 am (PDT) Paper Session A Q&A
09:40 am - 10:00 am (PDT) Break
10:00 am - 10:30 am (PDT) Invited Talk Linda Smith
(Indiana University)
10:30 am - 11:00 am (PDT) Invited Talk Felix Hill
11:00 am - 12:00 pm (PDT) Panel Discussion Kristen Grauman, Felix Hill, Katherine Kuchenbecker, Sergey Levine, Jitendra Malik, Linda Smith Having a question for the panelists? Ask here!
12:00 pm - 12:30 pm (PDT) Break
12:30 pm - 01:00 pm (PDT) Invited Talk Abhinav Gupta
01:00 pm - 01:30 pm (PDT) Invited Talk Sergey Levine
(UC Berkeley & Google)
01:30 pm - 02:00 pm (PDT) Paper Session B B1 - B4
02:00 pm - 02:10 pm (PDT) Paper Session B Q&A
02:10 pm - 02:30 pm (PDT) Break
02:30 pm - 03:00 pm (PDT) Invited Talk Jitendra Malik
(UC Berkeley & FAIR)
03:00 pm - 03:30 pm (PDT) Invited Talk Claudia Pérez D'Arpino
(Stanford University)
03:30 pm - 03:35 pm (PDT) Closing Remarks

Accepted Papers

Title Authors Paper Session
ABC Problem: An Investigation of Offline RL for Vision-Based Dynamic ManipulationKamyar Ghassemipour, Igor Mordatch, Shixiang Shane GuA1
Language Acquisition is Embodied, Interactive, Emotive: a Research ProposalCasey KenningtonA2
Ask & Explore: Grounded Question Answering for Curiosity-Driven ExplorationJivat Neet Kaur, Yiding Jiang, Paul Pu LiangA3
Towards Teaching Machines with Language: Interactive Learning From Only Language Descriptions of ActivitiesKhanh Nguyen, Dipendra Misra, Robert Schapire, Miroslav Dudik, Patrick ShaftoA4
YouRefIt: Embodied Reference Understanding with Language and GestureYixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao, Yixin Zhu, Song-Chun Zhu, Siyuan HuangA5
Learning to Set Waypoints for Audio-Visual NavigationChangan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen GraumanB1
Semantic Audio-Visual NavigationChangan Chen, Ziad Al-Halah, Kristen GraumanB2
Attentive Feature Reuse for Multi Task Meta learningKiran Lekkala, Laurent IttiB3
SeLaVi: self-labelling videos without any annotations from scratchYuki Asano, Mandela Patric, Christian Rupprecht, Andrea VedaldiB4