Joint Learning for 3D Perception and Action
Many challenges remain in applying machine learning to domains where obtaining massive annotated
data is difficult. We discuss approaches that aim to reduce supervision load for learning
algorithms in the visual and geometric domains by leveraging correlations among data as well
as among learning tasks -- what we call joint learning. The basic notion is that inference
problems do not occur in isolation but rather in a "social context" that can be exploited
to provide self-supervision by enforcing consistency, thus improving performance and increasing
sample efficiency. An example is voting mechanisms where multiple "experts" must collaborate
on predicting a particular outcome, such as an object detection. This is especially challenging
across different modalities, such as when mixing point clouds with image data, or geometry with
language data. Another example is the use of cross-task consistency constraints, as in the case
of inferring depth and normals from an image, which are obviously correlated. The talk will present
a number of examples of joint learning and of methods that facilitate information aggregation,
including the above as well as 3D object pose estimation and spatio-temporal data consolidation.
Professor Guibas heads the Geometric Computation group in the Computer Science Department of
Stanford University. He is acting director of the Artificial Intelligence Laboratory and member
of the Computer Graphics Laboratory, the Institute for Computational and Mathematical Engineering
(iCME) and the Bio-X program. His research centers on algorithms for sensing, modeling, reasoning,
rendering, and acting on the physical world. Professor Guibas' interests span computational
geometry, geometric modeling, computer graphics, computer vision, sensor networks, robotics,
and discrete algorithms --- all areas in which he has published and lectured extensively.
Leonidas Guibas obtained his Ph.D. from Stanford in 1976, under the supervision of Donald Knuth.
His main subsequent employers were Xerox PARC, MIT, and DEC/SRC. He has been at Stanford since
1984 as Professor of Computer Science. Professor Guibas has graduated 41 Ph.D. students and has
supervised 29 postdoctoral fellows, many of whom are well-known in computational geometry, in
computer graphics, in computer vision, in theoretical computer science, and in ad hoc and sensor
networks. At Stanford he has developed new courses in algorithms and data structures, geometric
modeling, geometric algorithms, computational biology, and sensor networks. Professor Guibas is
a member of the US National Academy of Engineering and the American Academy of Arts and Sciences,
an ACM Fellow, an IEEE Fellow and winner of the ACM Allen Newell award, the ICCV Helmholtz prize,
and a DoD Vennevar Bush Faculty Fellowship.
The automotive and transportation industry is going through a tectonic shift in the next decade
with the advent of Connectivity, Automation, Sharing, and Electrification (CASE). Autonomous
driving presents a historical opportunity to transform the academic, technological, and industrial
landscape with advanced sensing and actuation, high definition mapping, new machine learning
algorithms, smart planning and control, increasing computing powers, and new infrastructure with
5G, cloud and edge computing. Indeed, we have witnessed unprecedented innovation and activities
in the past five years in R&D, investment, joint ventures, road tests and commercial trials,
from auto makers, tier-ones, and new forces from the internet and high-tech industries.
In this talk, I will speak about this historical opportunity and challenges from technological,
industrial and policy perspectives. I will address some of the core controversial and critical
issues in the advancement of autonomous driving: open vs closed, Lidar vs cameras, progressive
L2-L3-L4 vs new L4, autonomous capabilities vs V2X, Robotaxi vs vertical, China vs global,
automakers vs new players, and the evolutional path and end game.
Dr. Ya-Qin Zhang is Chair Professor of AI Science of Tsinghua University, and Dean of Institute
for AI Industry Research of Tsinghua University (AIR). He served as the President of Baidu Inc.
for 5 years (2014.09-2019.10), responsible for Autonomous Driving, Cloud Computing, Emerging
Business, and Technology divisions. Prior to Baidu, Zhang was a key executive at Microsoft for
16 years, serving as Corporate Vice President and Chair/Cofounder of Microsoft Asia-Pacific
Research and Development Group.
Zhang was inducted to the American Academy of Arts and Sciences in 2019. He joined the Australian
Academy of Technology and Engineering (ATSE) as the only foreign Fellow in 2017. He became an IEEE
Fellow in 1997 at age 31 as the youngest Fellow of the organization’s 100+ year history at the
time. He is one of the top scientists in digital video and multimedia, with 558 papers in leading
international conferences and journals, 62 granted US patents, and 11 books. He has received many
prestigious academic, technological, and industrial awards, including the IEEE Centennial Medal,
IEEE Industrial Pioneer Award, IEEE Richard Merwin Medal, over a dozen best paper awards of various
IEEE transactions and journals.
Zhang was named one of the top 10 CEOs in Asia, 50 Global Shapers, Executive of the Year, IT
Innovator Leader Award by IT Times, Business Week, CNBC, Global business and Vision magazine.
He served on the Board of Directors of five public companies. He is on the industry board of
United Nation Development Program (UNDP), and received UN’s special award for sustainable
development in 2016. He is the Chairman of world’s largest open autonomous driving platform
“Apollo” alliance with over 160 global partners. He has been an active speaker in global forums
including APEC, Davos World Economic Forum, United Nations, and Bo’Ao Asia Forum.
Over the last few years, the applications of AI (artificial intelligence) technology are
becoming more and more approved by the society, as the number of people slowly warming up to
the idea of including AI in our everyday lives increases. On the other hand, with the popularity
of IoT (Internet of Things), the amount of data people need to analyze grows. The analysis of
these data relies more on AI technology. As a bridge between the physical world and the digital
world, IoT also provides new opportunities to apply to AI technology.
As a result, the birth of AIoT (artificial intelligence of things), the combination of AI and
IoT, is inevitable. AIoT has unwittingly penetrated every aspect of human life, from small
mobile applications and smart homes to large massive group analysis, city management, and
policymaking. However, AIoT, like any other technological inventions, brings not only crucial
and new opportunities but also challenging obstacles to human beings at the same time.
So, is AIoT our new Pandora’s box?
Yunhao Liu, ACM Fellow and IEEE Fellow, Professor at Tsinghua University. Yunhao received his
BS degree in Automation Department from Tsinghua University in 1995, and an MS and a Ph.D. degree
in Computer Science and Engineering at Michigan State University in 2003 and 2004, respectively.
He is now Editor in Chief of ACM Transactions of Sensor Networks and CCCF.
Convergence of GANs training: An Stochastic Approximation and Control Approach
Despite the popularity of Generative Adversarial Networks (GANs), there are well recognised and
documented issues for GANs training. In this talk, we will first introduce a stochastic differential
equation approximation approach for GANs training. We will next demonstrate the
connection of this SDE approach with the classical Newton’s method, and then show how this approach
will enable studies of the convergence of GANs training, as well as analysis
of the hyperparameters training for GANs in a stochastic control framework.
Time permitting, we will also discuss how the minimax games of GANs, such as Weissasertain
GANs, can be reformulated in the framework of OT by the duality representation.
Professor Xin Guo is currently the Coleman Fung Chair professor at the college of engineering of
UC Berkeley, and an Amazon scholar. Her research interests include theory of stochastic control
and game, machine learning.
Fundamental Limits of Differentiable Learning
We consider the statistical learning paradigm consisting of training general neural networks (NN)
with (S)GD. What is the class of functions that can be learned with this paradigm? How does it
compare to known classes of learning such as statistical query (SQ) learning or PAC? Is
depth/overapametrization needed to learn certain classes? We show that SGD on NN is equivalent
to PAC while GD on NN is equivalent to SQ, obtaining a separation between the classes learned by
SGD and GD. We further give a strong separation with kernel methods, exhibiting function classes
that kernels cannot learn with non-trivial edge but that GD on a differentiable model can learn
with perfect edge. Based on joint works with C. Sandon and E. Malach, P. Kamath, N. Srebro.
Emmanuel Abbe obtained his M.Sc. in Mathematics at EPFL and his Ph.D. in EECS at MIT. He was
an assistant professor and associate professor at Princeton University until 2018 and he is now
a professor of Mathematics and Computer Science at EPFL, where he holds the Chair of Mathematical
Data Science. His research interests are in information theory, machine learning and related
mathematical fields. He is the recipient of the Foundation Latsis International Prize, the Bell
Labs Prize, the von Neumann Fellowship, and the IEEE Information Theory Society Paper Award. He
is also a co-PI of the NSF-Simons Collaboration on the Theoretical Foundations of Deep Learning.
Reconstructing clear target images from incomplete and low signal-to-noise ratio observation
data is fundamental in computational imaging, and applications including autonomous driving,
surveillance, remote sensing, astronomical observation, bio-optics, and diagnostic medicine,
to name a few. The result of image reconstruction largely depends on the accuracy of the adopted
noise model and the image prior. High-quality reconstruction results rely on accurate noise and
image prior modeling. This report will introduce a physics-based noise model as well as a class
of general image reconstruction algorithm. Starting from the physical characteristics of
photosensors, the noise sources involved in the electronic imaging process from photon to
digital number, are precisely modelled. And a tuning-free plug-and-play proximal optimization
algorithm is proposed. This breaks the limitation of traditional imaging systems, yielding
promising results on extreme low-light imaging, compressive sensing MRI, sparse-view CT and
Ying Fu received the B.S. degree in Electronic Engineering from Xidian University in 2009,
the M.S. degree in Automation from Tsinghua University in 2012, and the Ph.D. degree in
Information Science and Technology from the University of Tokyo in 2015. She is currently a
Professor with the School of Computer Science and Technology, Beijing Institute of Technology.
Her research interests include physics-based vision, image/video processing, and computational
photography. She received the Outstanding Paper Awards from ICML'20 and PRCV’19.
Softmax Deep Double Deterministic Policy Gradients
A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep
Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can
negatively affect the performance. Although the state-of-the-art Twin Delayed Deep Deterministic
Policy Gradient algorithm (TD3) mitigates the overestimation issue, it can lead to a large
underestimation bias. In this paper, we propose to the use of the Boltzmann softmax operator
for value function estimation in continuous control. We first theoretically analyze the softmax
operator in continuous action space. Then, we uncover an important property of the softmax
operator in actor-critic algorithms, i.e., it helps to smooth the optimization landscape, which
sheds new light on the benefits of the operator. We also design two new algorithms, Softmax
Deep Deterministic Policy Gradients (SD2) and Softmax Deep Double Deterministic Policy Gradient
(SD3), by building the softmax operator upon single and double estimators. We show that doing
so enables a better value estimation, which effectively improves the overestimation and
underestimation bias. We conduct extensive experimental experiments on challenging continuous
control tasks, and results show that SD3 outperforms state-of-the-art methods.
Dr. Longbo Huang is an associate professor (with tenure) at the Institute for Interdisciplinary
Information Sciences (IIIS) at Tsinghua University, Beijing, China. He received his Ph.D. in EE
from the University of Southern California, and then worked as a postdoctoral researcher in the
EECS dept. at University of California at Berkeley before joining IIIS. Dr. Huang serves/served
on the editorial board for IEEE Transactions on Communications (TCOM), ACM Transactions on Modeling
and Performance Evaluation of Computing Systems (ToMPECS), IEEE Journal on Selected Areas in
Communications (JSAC-guest editor) and IEEE/ACM Transactions on Networking (ToN). He is a senior
member of IEEE and a member of ACM.
Dr. Huang has held visiting positions at the LIDS lab at MIT, the Chinese University of Hong Kong,
Bell-labs France, and Microsoft Research Asia (MSRA). He was a visiting scientist at the Simons
Institute for the Theory of Computing at UC Berkeley in Fall 2016. Dr. Huang received the
Outstanding Teaching Award from Tsinghua university in 2014. He received the Google Research
Award and the Microsoft Research Asia Collaborative Research Award in 2014, and was selected
into the MSRA StarTrack Program in 2015. Dr. Huang won the ACM SIGMETRICS Rising Star Research
Award in 2018.
Dr. Huang’s current research interests are in the areas of stochastic modeling and analysis,
reinforcement learning and control, optimization and machine learning, and big data analytics.
Learning Resource Allocation in Wireless Networks
Judicious allocation of limited radio resources (frequency spectrum, power, etc.) is a central
issue in wireless network design. Despite its sequential nature, the traditional approach is to
break down the problem into isolated optimizations at each time step with little consideration
for the long term effect. In this talk, we shall take a brand new look at the age-old wireless
resource allocation problem by recognizing its sequential nature and its goal of maximizing
expected performance across some time span, both of which are typical of reinforcement learning
problems. We will show that this new interpretation helps address service requirements that have
been deemed hard traditionally due to the difficulty to even model them in a mathematically exact
form. We propose to use multi-agent reinforcement learning for spectrum access in vehicular
networks and demonstrate that the agents successfully learn to cooperate in a distributed way
to simultaneously improve the conflicting design objectives of different vehicular links.
Le Liang is a Professor in the School of Information Science and Engineering, Southeast University,
Nanjing, China. His main research interests are in wireless communications, signal processing,
and machine learning. The current focus is on the theory and application of intelligent wireless
systems, including the design of intelligent wireless transceivers, distributed machine learning,
and networked multi-agent systems. He obtained his Ph.D. degree in electrical and computer
engineering from the Georgia Institute of Technology, Atlanta, GA, USA in 2018. From 2019 to 2021,
he was a Research Scientist at Intel Labs, Hillsboro, OR, USA. Dr. Liang currently serves as an
Editor for the IEEE Communications Letters and as an Associate Editor for the IEEE Journal on
Selected Areas in Communications Series on Machine Learning in Communications and Networks. He
received the Best Paper Award of IEEE/CIC ICCC in 2014.
Structures as Sensors: Physics-guided Model Transfer for Indirectly Monitoring Humans and
Surroundings with Ambient Structural Responses
Smart structures are designed to sense, understand, and respond to structure itself, the humans
within, and the surrounding environment. However, traditional monitoring approaches using
dedicated sensors often result in dense sensing systems that are difficult to install and
maintain in large-scale structures. This talk introduces “structures as sensors” approach
that utilizes the structure itself as a sensing medium to indirectly infer multiple types
of information (e.g., occupant activity, surrounding infrastructure states) through their
influence on the physical response of the structure. This is by realizing that the conditions
of the structure itself, the environment around, and the activities of users within all have a
direct impact on the physical responses of the structure. Challenges lie, however, in creating
robust inference models for analyzing noisy structural response data collected from multiple
structures. To this end, we developed physics-guided data analytics approaches that combine
statistical signal processing and machine learning with physical principles. Specifically, I
will present model transfer approaches for human and surrounding monitoring across multiple
structures. These methods are evaluated with real-world experiments, including our 6-year
railway and eldercare center deployments.
Hae Young Noh is an Associate Professor in the Department of Civil and Environmental Engineering
at Stanford University. Her research focuses on indirect sensing and physics-guided data analytics
to enable low-cost non-intrusive monitoring of cyber-physical-human systems. She is particularly
interested in developing structures to be self-, user-, and surrounding-aware to improve users’
quality of life and provide safe and sustainable built environments. The results of her work have
been deployed in a number of real-world applications from trains, to the Amish community, to
eldercare centers, to pig farms. Before joining Stanford, she was a faculty member at Carnegie
Mellon University. She received her Ph.D. and M.S. degrees in Civil and Environmental Engineering
and the second M.S. degree in Electrical Engineering at Stanford University. She earned her B.S.
degree in Mechanical and Aerospace Engineering at Cornell University. She received several awards,
including the Google Faculty Research Awards (2013, 2016), the Dean’s Early Career Fellowship
(2018), and the NSF CAREER Award (2017).
Towards a Better Global Landscape of GANs: How 2 Lines of Code Change Makes a Difference
In this talk, we present a global landscape analysis of GANs (generative adversarial nets).
We prove that a class of Separable-GANs (SepGAN), including the popular JS-GAN and hinge-GAN,
have exponentially many bad strict local minima; in addition, they are mode-collapse patterns.
To our knowledge, this is the first formal optimization characterization of mode collapse. In
addition, we prove that relativistic pairing GANs (RpGANs) haver no bad basins. RpGAN can be
viewed as an unconstrained variant of W-GAN: RpGAN keeps the "pairing" idea of W-GAN, but adds
an upper bounded shell function (e.g. logistic loss in JS-GAN). The empirical benefit of RpGANs
has been demonstrated by practitioners in, e.g., ESRGAN and realnessGAN. We demonstrate that
the better landscape of RpGANs makes a difference in practice, with 2 lines of code change.
We predict that RpGANs has a bigger advantage over SepGAN for high resolution data (e.g.
LSUN 256*256), imbalanced data and narrower nets (1/2 or 1/4 width), and our experiments
on real data verify these predictions. This work has appeared at NeurIPS 2020 as an oral
paper (1.1% of 9500+ submissions).
Ruoyu Sun is an assistant professor in the Department of Industrial and Enterprise Systems
Engineering (ISE) and affiliated with the Department of Electrical and Computer Engineering
(ECE) and Coordinated Science Lab (CSL), the University of Illinois at Urbana-Champaign (UIUC).
Before joining UIUC, he was a visiting research scientist at Facebook AI Research (FAIR) and
was a postdoctoral researcher at Stanford University. He obtained Ph.D. in electrical engineering
from the University of Minnesota, and a B.S. in mathematics from Peking University. He has won
second place in INFORMS George Nicholson student paper competition, and honorable mention of
INFORMS optimization society student paper competition. He has been serving as area chairs for
machine learning conferences such as NeurIPS, ICML, and ICLR. His current research interests
lie in optimization and machine learning, especially deep learning and large-scale optimization.
A Brain-inspired Architecture towards More General AI
At Neuro-Symbolic Lab of Alibaba Damo Academy, we explore various directions to help current
AI systems develop towards more general and human-like form. Approaches such as neuromorphic
computing, brain-inspired architecture, neuro-symbolic mechanisms and large-scale multi-modal
pretrain models are explored and fused to build AI systems with abilities like few shot learning,
compositional generalization, commonsense reasoning and explainablity. In this talk, we discuss
a few of such attempts and showcase the system we built - Photon System.
Dr. Fangbo Tao is currently the Founding Director of Neuro-Symbolic Lab in Alibaba, where he led
a research team of 20+ scientists and engineers, with background of CS, neuroscience and math,
to build more general AI systems. Prior joining Alibaba, Dr. Tao helped Facebook design and build
its first content understanding platform in News Feed as senior research scientist and tech lead.
Dr. Tao obtained his Ph.D. degree under Prof. Jiawei Han's supervision at UIUC, and earned
Bachelor's degree in Tsinghua University.
Demystifying (Deep) Reinforcement Learning with Optimism and Pessimism
Coupled with powerful function approximators such as deep neural networks, reinforcement
learning (RL) achieves tremendous empirical successes. However, its theoretical understandings
lag behind. In particular, it remains unclear how to provably attain the optimal policy with a
finite regret or sample complexity. In this talk, we will present the two sides of the same coin,
which demonstrates an intriguing duality between optimism and pessimism.
– In the online setting, we aim to learn the optimal policy by actively interacting with the
environment. To strike a balance between exploration and exploitation, we propose an optimistic
least-squares value iteration algorithm, which achieves a T2 regret in the presence of
linear, kernel, and neural function approximators.
– In the offline setting, we aim to learn the optimal policy based on a dataset collected a
priori. Due to a lack of active interactions with the environment, we suffer from the insufficient
coverage of the dataset. To maximally exploit the dataset, we propose a pessimistic least-squares
value iteration algorithm, which achieves a minimax-optimal sample complexity.
Zhaoran Wang is an assistant professor in the Departments of Industrial Engineering & Management
Sciences and Computer Science (by courtesy) at Northwestern University (since 2018). He is
affiliated with the Centers for Deep Learning and Optimization & Statistical Learning. The
long-term goal of his research is to develop a new generation of data-driven decision-making
methods, theory, and systems, which tailor artificial intelligence towards addressing pressing
societal challenges. To this end, his research aims at: making deep reinforcement learning more
efficient, both computationally and statistically, in a principled manner to enable its
applications in critical domains; scaling deep reinforcement learning to design and optimize
societal-scale multi-agent systems, especially those involving cooperation and/or competition
among humans and/or robots. With this aim in mind, his research interests span across machine
learning, optimization, statistics, game theory, and information theory.
Contrastive Multimodal Fusion with PairInfoNCE and TupleInfoNCE
Self-supervised representation learning is a critical problem in computer vision, as it
provides a way to pretrain feature extractors on large unlabeled datasets that can be used
as an initialization for more efficient and effective training on downstream tasks.
A promising approach is to use contrastive learning to learn a latent space where features
are close for similar data samples and far apart for dissimilar ones. This approach has
demonstrated tremendous success for pre-training single modality (e.g., image, point cloud,
speech) feature extractors. However, using contrastive losses to learn good representations
which could properly fuse multimodal data (e.g., RGB-D scans, video with subtitles) remains
an open question. A traditional approach is to contrast different modalities to learn the
information shared among them. However, that approach could fail to learn the complementary
synergies between modalities that might be useful for downstream tasks. Another approach is
to concatenate all the modalities into a tuple and then contrast positive and negative tuple
correspondences. However, that approach could consider only the stronger modalities while
ignoring the weaker ones. To address these issues, I will present novel contrastive learning
objectives including PairInfoNCE and TupleInfoNCE. The key is to introduce challenging
negatives while contrasting multimodal tuples, which encourages the learning model to examine
the multimodal correspondences and ensures that weak modalities are not ignored. I will
show how to properly design and optimize these objectives, along with a theoretical
justification for the design. In addition, I will provide extensive experimental evaluations
on a wide range of multimodal learning benchmarks to show the efficacy of our method.
Li Yi is a tenure-track assistant professor in the Institute for Interdisciplinary Information
Sciences (IIIS) at Tsinghua University. He received his Ph.D. from Stanford University, advised
by Professor Leonidas J. Guibas. And he was previously a Research Scientist at Google working
closely with Professor Thomas Funkhouser. He had also spent time at USC IRIS computer vision lab,
Adobe Research, Baidu Research. Before joining Stanford, he got his B.E. in Electronic Engineering
from Tsinghua University. His research interests span across 3D computer vision, computer graphics,
and robot learning, and his mission is to equip robotic agents with the ability of understanding
and interacting with the 3D world. He has published papers at CVPR, ICCV, ECCV, NeurIPS,
SIGGRAPH, SIGGRAPH Asia, etc. His representative work includes ShapeNet, spectral graph CNNs,
PointNet++. He is going to serve as an area chair for CVPR 2022.