Meta-learning and Automated Machine Learning:
Approaches and Applications

IJCAI 2020, Yokohama, Japan

Speakers

Xin Wang Tsinghua University, China

Xin Wang is currently an Assistant Professor at the Department of Computer Science and Technology, Tsinghua University. He got both of his Ph.D. and B.E degrees in Computer Science and Technology from Zhejiang University, China. He also holds a second Ph.D. degree in Computing Science from Simon Fraser University, Canada. His research interests include cross-modal multimedia intelligence and recommendation in social media. He has published several high-quality research papers in top conferences including ICML, KDD, WWW, SIGIR, AAAI, IJCAI, CIKM etc.

Huaxiu Yao Pennsylvania State University, USA

Huaxiu Yao is currently a Ph.D. candidate of College of Information Sciences and Technology at the Pennsylvania State University. He got his B.Eng. degree from the University of Electronic Science and Technology of China. His research interests focus on developing cost-efficient machine learning algorithms with the applications in social goods. He has published over 10 papers on top conferences and journals such as ICML, ICLR, KDD, AAAI, WWW, and TIST. He has served as a program committee member in major machine learning and data mining conferences such as ICML, ICLR, KDD, AAAI, IJCAI.

Ying Wei Tencent AI Lab, China

Ying Wei is a Senior Researcher at Tencent AI Lab. She works on machine learning, and is especially interested in solving challenges in transfer and meta learning by pushing the boundaries of both theories and applications. She received her Ph.D. degree from Department of Computer Science and Engineering, Hong Kong University of Science and Technology in 2017. Before this, she completed her Bachelor degree from Huazhong University of Science and Technology in 2012, with first class honors. She has served as (senior) program committee member for many top venues of artificial intelligence including ICML, NeuRIPS, AAAI, IJCAI, ICLR, UAI, and WWW. She is also a reviewer for the journals of PAMI and ML.

Zhenhui (Jessie) Li Pennsylvania State University, USA

Zhenhui Li is a tenured associate professor of Information Sciences and Technology at the Pennsylvania State University. She is Haile family early career endowed professor. Prior to joining Penn State, she received her PhD degree in Computer Science from University of Illinois Urbana- Champaign in 2012, where she was a member of data mining research group. Her research has been focused on mining spatial-temporal data with applications in transportation, ecology, environment, social science, and urban computing. She is a passionate interdisciplinary researcher and has been actively collaborating with cross-domain researchers. She has served as organizing committee or senior program committee of many conferences including KDD, ICDM, SDM, CIKM, and SIGSPATIAL. She has been regularly offering classes on data organizing and data mining since 2012. Her classes have constantly received high student ratings. She has received NSF CAREER award, junior faculty excellence in research, and George J. McMurtry junior faculty excellence in teaching and learning award.

Wenwu Zhu Tsinghua University, China

Wenwu Zhu is currently a Professor and the Vice Chair of the Department of Computer Science and Technology at Tsinghua University, the Vice Dean of National Research Center for Information Science and Technology, and the Vice Director of Tsinghua Center for Big Data. Prior to his current post, he was a Senior Researcher and Research Manager at Microsoft Research Asia. He was the Chief Scientist and Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He received his Ph.D. degree from New York University in 1996 in Electrical and Computer Engineering.

Wenwu Zhu is an AAAS Fellow, IEEE Fellow, SPIE Fellow, and a member of The Academy of Europe (Academia Europaea). He has published over 300 referred papers in the areas of multimedia computing, communications and networking, and big data. He is inventor or co-inventor of over 50 patents. He received eight Best Paper Awards, including ACM Multimedia 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001. His current research interests are in the area of Cyber-Physical-Human big data computing, and Cross-media big data and intelligence.

Wenwu Zhu served as EiC for IEEE Transactions for Multimedia from January 2017 to December 2019. He also served as Guest Editors for the Proceedings of the IEEE, IEEE Journal on Selected Areas in Communications, ACM Transactions on Intelligent Systems and Technology, etc.; and Associate Editors for IEEE Transactions on Mobile Computing, ACM Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Transactions on Big Data, etc. He served in the steering committee for IEEE Transactions on Multimedia (2015-2016) and IEEE Transactions on Mobile Computing (2007-2010), respectively. He served as TPC Co-chair for ACM Multimedia 2014 and IEEE ISCAS 2013, respectively. He serves as General Co-Chair for ACM Multimedia 2018 and ACM CIKM 2019, respectively.

Wenpeng Zhang

Wenpeng Zhang obtained his Ph.D degree in machine learning at Tsinghua University, China in 2018. He has published several papers in top tier conferences and journals, including ICML, WWW, KDD, TKDE etc. Now, his research interests lie in online learning, Metalearning and AutoML. He led the team Meta_Learners that won the second place in the NeuralPS 2018 AutoML Competition.

Tutorial Description

Continuous and fast adaptation to novel tasks is one of the key characteristics of human intelligence, while it remains a daunting challenge for artificial intelligence. To solve this challenge, learning to learn (a.k.a., meta-learning) and automated machine learning (a.k.a, AutoML), as two hot topics in both academia and industry, both aim at adapting to different tasks and different data through leveraging various knowledge. Notably, they have achieved great success in diverse applications, such as image classification, question answering systems, and spatiotemporal prediction. In this tutorial, we will disseminate and promote the recent research achievements on meta-learning and AutoML as well as their applications, which is an exciting and fastgrowing research direction in the general field of machine learning. We will also advocate novel, high-quality research findings, and innovative solutions to the challenging problems in meta-learning and AutoML. This tutorial will consist of seven parts. We first give a brief introduction on the research and industrial motivation, followd by discussions on learning to learn and learning to transfer, two key problems in meta-learning. Then will turn to hyperparameter optimization, neural architecture search as well as their applications. We will also talk about some recent advances on full architecture search and auto graph learning. We finally share some of our insights on the trending for meta-learning and AutoML. All the slides and our experiences in winning the second place in NeuralIPS 2018 AutoML Competition and ACML 2019 AutoWSL Competition will be shared with all the audiences.


Tutorial Outline

The tutorial is scheduled for half-day and organized into 7 sections.

  • The research and industrial motivation
  • Learning to learn
  • Learning to transfer
  • Hyperparameter optimization
  • Neural architecture search
  • Full architecture search, auto graph learning and other Advances on AutoML
  • Discussions and future directions

  • Target Audience and Prerequisites

    This tutorial will be highly accessible to the whole AI community, including researchers, students and practitioners who are interested in AutoML, meta-learning and their applications in AI related tasks. The tutorial will be self-contained and designed for introductory and intermediate audiences. The audiences are supposed to have basic knowledge in machine learning, linear algebra, and calculus. In particular, audiences who have engaged in related topics (e.g., deep learning, reinforcement learning, transfer learning, multi-task learning, optimization) are welcome to have Q&A interaction during the tutorial.


    Motivation, Relevance and Rationale

    This tutorial is to disseminate and promote the recent research achievements on meta-learning and AutoML as well as its applications, which is an exciting and fast-growing research direction in the general field of machine learning. We will advocate novel, high-quality research findings, and innovative solutions to the challenging problems in meta-learning and AutoML. This topic is at the core of the scope of IJCAI, and is attractive to IJCAI audience from both academia and industry. The objective of "Motivate and explain a topic of emerging importance for AI" will be best served by this tutorial.


    Tutorial Overview

    We introduce the most recent updates and advances in meta-learning and AutoML during the past years. And all the slides and our experiences in winning the second place in NeuralIPS 2018 AutoML Competition and ACML 2019 AutoWSL Competition will be shared with all the audiences.

    Meta-learning

    Meta-learning refers to utilizing past experience from solving the related tasks to facilite the task being solved. In meta-learning, meta-data is collect to describe previous tasks and models. Then the meta-data is utilized to guide the search for optimal models for the new tasks. We will discuss meta-learning from two aspects: learning to learn and learning to transfer.

    Learning to Learn

    We discuss recent progresses of learning to learn (a.k.a., meta-learning). Here we present three influential categories of methods, including black-box amortized, gradient-based, and non-parametric meta-learning methods. Besides, we focus on how existing works tackle the challenging problem of task heterogeneity. Finally, various application examples of learning to learn are demonstrated.

  • Black-box amortized meta-learning methods train the meta-learner itself by using a neural network.
  • Gradient-based meta-learning methods formulate the meta-learner as the initializations for model parameters and/or the optimization trace.
  • Representatively, model-agnostic meta-learning and its variants learn a well-generalized model parameter initializations from which the target task of interest quickly converges.
  • Non-parametric methods injects a learnable metalearner into the base lazy learner, where the meta-learner learns from previous tasks the metric space mapping function of the lazy learner.
  • Heterogeneous meta-learning where tasks are sampled from different distributions, yet challenging, is much more grounded than traditional meta-learning methods with the assumption that all tasks come from the same underlying distribution.
  • The applications of meta-learning we discuss here range from computer vision, to natural language processing, to graph representation learning, to spatiotemporal prediction.
  • Learning to Transfer

    Besides learning to learn a typical task, an artificial intelligence agent has also been empowered to learn to quickly transfer knowledge from previous tasks via identifying what to transfer. The idea brings from the widely accepted principle that human beings improve transfer learning skills of deciding what to transfer through meta-cognitive reflection on inductive transfer learning practices. We will introduce the learning-to-transfer framework, the promising techniques and applications of learning to transfer.

  • The framework of learning to transfer opens up a new direction of optimizing knowledge transfer, by borrowing transfer learning skills from previous transfer learning experiences.
  • We next detail both the non-neural and neural instantiations of the learning to transfer framework.
  • Three major applications of learning to transfer, i.e., computer vision, natural language processing and healthcare, will be introduced.
  • Hyperparameter Optimization

    Every machine learning system has hyperparameters. The choice of the hyperparameters significantly affects the effectiveness of the learning system. Especially in deep learning systems, there can be tens of thousands of hypermeters regarding neural architecture, regularization and optimization. Finding suitable hyperparameters often requires expert knowledge and sufficient experience in the ML system, which prohibits the feasibility and efficiency of ML system in real-world application fields. Hyperparameter Optimization (HPO) aims to automatedly select the optimal configurations of hyperparameters. It can reduce the human effert and improve the performance and reproducibility of the machine learning systems.

    Model-free Methods

    The most basic model-free HPO methods include grid search and random search. Population-based methods are another important branch of model-free HPO methods, which include genetic algorithms, evolutionary algorithms, etc. Model-free methods are usually simple for implementations and free from specific assumptions on the machine learning system being optimized.

    Bayesian Optimization

    Bayesian optimization (BO) is a state-of-the-art framework for optimizing blackbox functions. It includes a probabilistic model which is fitted to the observations, as well as an acquisition function that determines the candidate optimal hyperparameters by using the probabilistic model. Commonly used probabilistic models include Gaussian processes (GP) and the Tree Parzen Estimator (TPE). The most common choice of the acquisition function is the expected improvement (EI). BO forms the basis of several prominent AutoML frameworks, such as Auto-WEKA and Auto-sklearn.

    Bandit-based Methods

    In blackbox HPO methods such as BO, the performance evaluation can be very expensive, especially in large dataset sizes and complex models; each evaluation requires an entire running procedure of the machine learning system being optimized. To tackle this problem, bandit-based algorithms are proposed to cut off the less promising configurations early and focus on configurations that are expected to be optimal. Bandit-based HPO methods built on the assumption that, a small subset of data is sufficient to indicate the final performance of the candidate configurations. A simple yet powerful method is the successive halving (SH). HyperBand further consider different trade-off between the total budget and the number of candidate configurations to improve effectiveness of SH. BOHB combine Bayesian optimization and HyperBand to improve anytime performance and final performance simutaneously.

    Neural and Full Architecture Search

    Deep learning methods are very successful in solving tasks in machine translation, image and speech recognition or reinforcement learning in general. Neural Architecture Search (NAS), the process of automating architecture engineering, is important in automating machine learning. NAS can be seen as subfield of AutoML and has significant overlap with hyperparameter optimization and meta-learning. Targethe methods for NAS can be categorized according to three dimensions: search space, optimization methods, and performance estimation strategy. The NAS part of this tutorial is structured with respect to these three dimensions.

    Search Space

    The search space defines which architectures can be represented in principle. As the optimization problem of NAS is often non-continuous and high-dimensional, it can be largely simplified by a suitable choice of the search space, since the size of the search space can be reduced by incorporating prior knowledge on the specific task.

    Search Strategy

    The search strategy is about the exploration of the search space. There have been several different search strategies for the exploration of the search space, such as reinforcement learning, one-shot architecture search and evolutionary algorithms.

    Performance Estimation Strategy

    To guide the search process, Performance Estimation Strategy is used to estimate this performance of a given architecture. The simplest way is to perform a standard training and validation of the architecture on data. However, it can consume thousands of GPU days for NAS to train each architecture from scratch frequently, which is computationally expensive. Hence, many methods have been proposed to reduce the cost of performance estimations recently.