Embodied Multimodal Learning Workshop | ICLR 2021

Abstract

Despite encouraging progress in embodied learning over the past two decades, there is still a large gap between embodied agents' perception and human perception. Humans have remarkable capabilities combining all our multisensory inputs. To close the gap, embodied agents should also be enabled to see, hear, touch, and interact with their surroundings in order to select the appropriate actions. However, today's learning algorithms primarily operate on a single modality. In order for Artificial Intelligence to make progress in understanding the world around us, it needs to be able to interpret such multimodal signals jointly. The goal of this workshop is to share recent progress and discuss current challenges on embodied learning with multiple modalities.

The EML workshop will bring together researchers in different subareas of embodied multimodal learning including computer vision, robotics, machine learning, natural language processing, and cognitive science to examine the challenges and opportunities emerging from the design of embodied agents that unify their multisensory inputs. We will review the current state and identify the research infrastructure needed to enable a stronger collaboration between researchers working on different modalities.

Invited Speakers

Claudia Pérez D'Arpino
(Stanford)

Abhinav Gupta
(CMU & FAIR)

Felix Hill
(DeepMind)

Danica Kragic
(KTH)

Katherine Kuchenbecker
(MPI-IS)

Sergey Levine
(UC Berkeley & Google)

Jitendra Malik
(UC Berkeley & FAIR)

Linda Smith
(Indiana University)

Call for Papers

We invite submissions of 2-4 pages extended abstracts in topics related to (but not limited to):

audio-visual embodied learning

touch sensing and embodied learning

language and embodied learning

speech and embodied learning

self-supervised/semi-supervised learning with multiple modalities

multimodal reinforcement learning

meta-learning with multiple modalities

novel multimodal datasets/simulators/tasks for embodied agents

combining multisensory inputs for robot perception

bio-inspired approaches for multimodal perception

A submission should take the form of an extended abstract (2-4 pages long excluding references) in PDF format using the ICLR style. We will accept submissions of (1) papers that have not been previously published or accepted for publication in substantially similar form; (2) papers that have been published or accepted for publication in recent venues including journal, conference, workshop, and arXiv; and (3) research proposals for future work with a focus on well-defined concepts and ideas. All submissions will be reviewed with single blind policy. Accepted extended abstracts will not appear in ICLR proceedings, and hence will not affect future publication of the work. We will publish all accepted extended abstracts on the workshop webpage.

CMT submissions website: https://cmt3.research.microsoft.com/EML2021

Key Dates:

Extended abstract submission deadline: March 5th, 2021 (11:59 PM PST)

Late submission deadline: March 22nd, 2021 (11:59 PM PST)

Notification to authors: March 26th, 2021

Workshop date: May 7th, 2021

Program Committee:

Unnat Jain (UIUC), Michelle Lee (Stanford), Paul Pu Liang (CMU), Senthil Purushwalkam (CMU), Santhosh Kumar Ramakrishnan (UT Austin), Mohit Shridhar (UW), Tianmin Shu (MIT), Shaoxiong Wang (MIT)

Schedule

07:55 am - 08:00 am (PDT)	Introduction and Opening Remarks
08:00 am - 08:30 am (PDT)	Invited Talk	Katherine Kuchenbecker (MPI-IS)
08:30 am - 09:00 am (PDT)	Invited Talk	Danica Kragic (KTH)
09:00 am - 09:30 am (PDT)	Paper Session A	A1 - A5
09:30 am - 09:40 am (PDT)	Paper Session A Q&A
09:40 am - 10:00 am (PDT)	Break
10:00 am - 10:30 am (PDT)	Invited Talk	Linda Smith (Indiana University)
10:30 am - 11:00 am (PDT)	Invited Talk	Felix Hill (DeepMind)
11:00 am - 12:00 pm (PDT)	Panel Discussion	Kristen Grauman, Felix Hill, Katherine Kuchenbecker, Sergey Levine, Jitendra Malik, Linda Smith	Having a question for the panelists? Ask here!
12:00 pm - 12:30 pm (PDT)	Break
12:30 pm - 01:00 pm (PDT)	Invited Talk	Abhinav Gupta (CMU & FAIR)
01:00 pm - 01:30 pm (PDT)	Invited Talk	Sergey Levine (UC Berkeley & Google)
01:30 pm - 02:00 pm (PDT)	Paper Session B	B1 - B4
02:00 pm - 02:10 pm (PDT)	Paper Session B Q&A
02:10 pm - 02:30 pm (PDT)	Break
02:30 pm - 03:00 pm (PDT)	Invited Talk	Jitendra Malik (UC Berkeley & FAIR)
03:00 pm - 03:30 pm (PDT)	Invited Talk	Claudia Pérez D'Arpino (Stanford University)
03:30 pm - 03:35 pm (PDT)	Closing Remarks

Accepted Papers

Title	Authors	Paper Session
ABC Problem: An Investigation of Offline RL for Vision-Based Dynamic Manipulation	Kamyar Ghassemipour, Igor Mordatch, Shixiang Shane Gu	A1
Language Acquisition is Embodied, Interactive, Emotive: a Research Proposal	Casey Kennington	A2
Ask & Explore: Grounded Question Answering for Curiosity-Driven Exploration	Jivat Neet Kaur, Yiding Jiang, Paul Pu Liang	A3
Towards Teaching Machines with Language: Interactive Learning From Only Language Descriptions of Activities	Khanh Nguyen, Dipendra Misra, Robert Schapire, Miroslav Dudik, Patrick Shafto	A4
YouRefIt: Embodied Reference Understanding with Language and Gesture	Yixin Chen, Qing Li, Deqian Kong, Yik Lun Kei, Tao Gao, Yixin Zhu, Song-Chun Zhu, Siyuan Huang	A5
Learning to Set Waypoints for Audio-Visual Navigation	Changan Chen, Sagnik Majumder, Ziad Al-Halah, Ruohan Gao, Santhosh K. Ramakrishnan, Kristen Grauman	B1
Semantic Audio-Visual Navigation	Changan Chen, Ziad Al-Halah, Kristen Grauman	B2
Attentive Feature Reuse for Multi Task Meta learning	Kiran Lekkala, Laurent Itti	B3
SeLaVi: self-labelling videos without any annotations from scratch	Yuki Asano, Mandela Patric, Christian Rupprecht, Andrea Vedaldi	B4

Organizers

Ruohan Gao
(Stanford)

Andrew Owens
(UMich)

Dinesh Jayaraman
(UPenn)

Yuke Zhu
(UT Austin & Nvidia)

Jiajun Wu
(Stanford)

Kristen Grauman
(UT Austin & FAIR)