Reconstructing and Recognizing Human Actions in Videos

Professor Jitendra Malik
Arthur J. Chick Professor of EECS, University of California, Berkeley, and VP, Robotics Research at FAIR, Meta Inc.

Description

WILLIAM MONG DISTINGUISHED LECTURES: RPG SHARING SERIES:

Humans are social animals. Perhaps this is why we so enjoy watching movies, TV shows and YouTube videos, all of which show people in action. A central problem for artificial intelligence therefore is to develop techniques for analyzing and understanding human behavior from images and videos.

I present recent results from our research group towards this grand challenge. We have developed highly accurate techniques for reconstructing 3D meshes of human bodies from single images using transformer neural networks. Given video input, we link these reconstructions over time by 3D tracking, thus producing “Humans in 4D” (3D in space + 1D in time). As a fun application, we can use this capability to transfer the 3D motion of one person to another e.g. to generate a video of you performing Michael Jackson’s moonwalk or Michelle Kwan’s skating routine.

The ability to do 4D reconstruction of hands is a source of imitation learning for robotics and we show examples of reconstructing human-object interactions. In addition to 4D reconstruction, we are also now able to recognize actions by attaching semantic labels such as “standing”, “running”, or “jumping”. However, long range video understanding, such as the ability to follow characters’ activities and understand movie plots over periods of minutes and hours, is still quite a challenge, and even the largest vision-language models struggle on such tasks. There has been substantial progress, but much remains to be done.

About the Speaker

Jitendra Malik is Arthur J. Chick Professor of EECS at UC Berkeley, and VP, Robotics Research at FAIR, Meta Inc. His group has conducted research on many different topics in computer vision, computer graphics, machine learning and robotics resulting in concepts such as anisotropic diffusion, high dynamic range imaging, normalized cuts, R-CNN and rapid motor adaptation. His publications have received eleven best paper awards, including six test of time awards – the Longuet-Higgins Prize for papers published at CVPR (three times) and the Helmholtz Prize for papers published at ICCV (three times). He has mentored more than 80 PhD students and postdoctoral fellows, many of whom have gone on to become leading researchers at places like MIT, Berkeley, CMU, Caltech, Cornell, UIUC, UPenn, Michigan, UT Austin, Google and Meta.

Jitendra received the 2016 ACM/AAAI Allen Newell Award, 2018 IJCAI Award for Research Excellence in AI, and the 2019 IEEE Computer Society’s Computer Pioneer Award for “leading role in developing Computer Vision into a thriving discipline through pioneering research, leadership, and mentorship”. He is a member of the US National Academy of Sciences, the National Academy of Engineering and Fellow, American Academy of Arts and Sciences.