Meta-learning and AutoML Tutorial

Tutorial Description

Continuous and fast adaptation to novel tasks is one of the key characteristics of human intelligence, while it remains a daunting challenge for artificial intelligence. To solve this challenge, learning to learn (a.k.a., meta-learning) and automated machine learning (a.k.a, AutoML), as two hot topics in both academia and industry, both aim at adapting to different tasks and different data through leveraging various knowledge. Notably, they have achieved great success in diverse applications, such as image classification, question answering systems, and spatiotemporal prediction. In this tutorial, we will disseminate and promote the recent research achievements on meta-learning and AutoML as well as their applications, which is an exciting and fastgrowing research direction in the general field of machine learning. We will also advocate novel, high-quality research findings, and innovative solutions to the challenging problems in meta-learning and AutoML. This tutorial will consist of seven parts. We first give a brief introduction on the research and industrial motivation, followd by discussions on learning to learn and learning to transfer, two key problems in meta-learning. Then will turn to hyperparameter optimization, neural architecture search as well as their applications. We will also talk about some recent advances on full architecture search and auto graph learning. We finally share some of our insights on the trending for meta-learning and AutoML. All the slides and our experiences in winning the second place in NeuralIPS 2018 AutoML Competition and ACML 2019 AutoWSL Competition will be shared with all the audiences.

Tutorial Outline

The tutorial is scheduled for half-day and organized into 7 sections.

The research and industrial motivation

Learning to learn

Learning to transfer

Hyperparameter optimization

Neural architecture search

Full architecture search, auto graph learning and other Advances on AutoML

Discussions and future directions

Target Audience and Prerequisites

This tutorial will be highly accessible to the whole AI community, including researchers, students and practitioners who are interested in AutoML, meta-learning and their applications in AI related tasks. The tutorial will be self-contained and designed for introductory and intermediate audiences. The audiences are supposed to have basic knowledge in machine learning, linear algebra, and calculus. In particular, audiences who have engaged in related topics (e.g., deep learning, reinforcement learning, transfer learning, multi-task learning, optimization) are welcome to have Q&A interaction during the tutorial.

Motivation, Relevance and Rationale

This tutorial is to disseminate and promote the recent research achievements on meta-learning and AutoML as well as its applications, which is an exciting and fast-growing research direction in the general field of machine learning. We will advocate novel, high-quality research findings, and innovative solutions to the challenging problems in meta-learning and AutoML. This topic is at the core of the scope of IJCAI, and is attractive to IJCAI audience from both academia and industry. The objective of "Motivate and explain a topic of emerging importance for AI" will be best served by this tutorial.

Tutorial Overview

We introduce the most recent updates and advances in meta-learning and AutoML during the past years. And all the slides and our experiences in winning the second place in NeuralIPS 2018 AutoML Competition and ACML 2019 AutoWSL Competition will be shared with all the audiences.

Meta-learning

Meta-learning refers to utilizing past experience from solving the related tasks to facilite the task being solved. In meta-learning, meta-data is collect to describe previous tasks and models. Then the meta-data is utilized to guide the search for optimal models for the new tasks. We will discuss meta-learning from two aspects: learning to learn and learning to transfer.

Learning to Learn

We discuss recent progresses of learning to learn (a.k.a., meta-learning). Here we present three influential categories of methods, including black-box amortized, gradient-based, and non-parametric meta-learning methods. Besides, we focus on how existing works tackle the challenging problem of task heterogeneity. Finally, various application examples of learning to learn are demonstrated.

Black-box amortized meta-learning methods train the meta-learner itself by using a neural network.

Gradient-based meta-learning methods formulate the meta-learner as the initializations for model parameters and/or the optimization trace.

Representatively, model-agnostic meta-learning and its variants learn a well-generalized model parameter initializations from which the target task of interest quickly converges.

Non-parametric methods injects a learnable metalearner into the base lazy learner, where the meta-learner learns from previous tasks the metric space mapping function of the lazy learner.

Heterogeneous meta-learning where tasks are sampled from different distributions, yet challenging, is much more grounded than traditional meta-learning methods with the assumption that all tasks come from the same underlying distribution.

The applications of meta-learning we discuss here range from computer vision, to natural language processing, to graph representation learning, to spatiotemporal prediction.

Learning to Transfer

Besides learning to learn a typical task, an artificial intelligence agent has also been empowered to learn to quickly transfer knowledge from previous tasks via identifying what to transfer. The idea brings from the widely accepted principle that human beings improve transfer learning skills of deciding what to transfer through meta-cognitive reflection on inductive transfer learning practices. We will introduce the learning-to-transfer framework, the promising techniques and applications of learning to transfer.

The framework of learning to transfer opens up a new direction of optimizing knowledge transfer, by borrowing transfer learning skills from previous transfer learning experiences.

We next detail both the non-neural and neural instantiations of the learning to transfer framework.

Three major applications of learning to transfer, i.e., computer vision, natural language processing and healthcare, will be introduced.

Hyperparameter Optimization

Every machine learning system has hyperparameters. The choice of the hyperparameters significantly affects the effectiveness of the learning system. Especially in deep learning systems, there can be tens of thousands of hypermeters regarding neural architecture, regularization and optimization. Finding suitable hyperparameters often requires expert knowledge and sufficient experience in the ML system, which prohibits the feasibility and efficiency of ML system in real-world application fields. Hyperparameter Optimization (HPO) aims to automatedly select the optimal configurations of hyperparameters. It can reduce the human effert and improve the performance and reproducibility of the machine learning systems.

Model-free Methods

The most basic model-free HPO methods include grid search and random search. Population-based methods are another important branch of model-free HPO methods, which include genetic algorithms, evolutionary algorithms, etc. Model-free methods are usually simple for implementations and free from specific assumptions on the machine learning system being optimized.

Bayesian Optimization

Bayesian optimization (BO) is a state-of-the-art framework for optimizing blackbox functions. It includes a probabilistic model which is fitted to the observations, as well as an acquisition function that determines the candidate optimal hyperparameters by using the probabilistic model. Commonly used probabilistic models include Gaussian processes (GP) and the Tree Parzen Estimator (TPE). The most common choice of the acquisition function is the expected improvement (EI). BO forms the basis of several prominent AutoML frameworks, such as Auto-WEKA and Auto-sklearn.

Bandit-based Methods

In blackbox HPO methods such as BO, the performance evaluation can be very expensive, especially in large dataset sizes and complex models; each evaluation requires an entire running procedure of the machine learning system being optimized. To tackle this problem, bandit-based algorithms are proposed to cut off the less promising configurations early and focus on configurations that are expected to be optimal. Bandit-based HPO methods built on the assumption that, a small subset of data is sufficient to indicate the final performance of the candidate configurations. A simple yet powerful method is the successive halving (SH). HyperBand further consider different trade-off between the total budget and the number of candidate configurations to improve effectiveness of SH. BOHB combine Bayesian optimization and HyperBand to improve anytime performance and final performance simutaneously.

Neural and Full Architecture Search

Deep learning methods are very successful in solving tasks in machine translation, image and speech recognition or reinforcement learning in general. Neural Architecture Search (NAS), the process of automating architecture engineering, is important in automating machine learning. NAS can be seen as subfield of AutoML and has significant overlap with hyperparameter optimization and meta-learning. Targethe methods for NAS can be categorized according to three dimensions: search space, optimization methods, and performance estimation strategy. The NAS part of this tutorial is structured with respect to these three dimensions.

Search Space

The search space defines which architectures can be represented in principle. As the optimization problem of NAS is often non-continuous and high-dimensional, it can be largely simplified by a suitable choice of the search space, since the size of the search space can be reduced by incorporating prior knowledge on the specific task.

Search Strategy

The search strategy is about the exploration of the search space. There have been several different search strategies for the exploration of the search space, such as reinforcement learning, one-shot architecture search and evolutionary algorithms.

Performance Estimation Strategy

To guide the search process, Performance Estimation Strategy is used to estimate this performance of a given architecture. The simplest way is to perform a standard training and validation of the architecture on data. However, it can consume thousands of GPU days for NAS to train each architecture from scratch frequently, which is computationally expensive. Hence, many methods have been proposed to reduce the cost of performance estimations recently.

Meta-learning and Automated Machine Learning:
Approaches and Applications

Speakers

Xin Wang Tsinghua University, China

Huaxiu Yao Pennsylvania State University, USA

Ying Wei City University of Hong Kong, China

Zhenhui (Jessie) Li Pennsylvania State University, USA

Wenwu Zhu Tsinghua University, China

Tutorial Description

Tutorial Outline

Target Audience and Prerequisites

Motivation, Relevance and Rationale

Tutorial Overview

Meta-learning

Learning to Learn

Learning to Transfer

Hyperparameter Optimization

Model-free Methods

Bayesian Optimization

Bandit-based Methods

Neural and Full Architecture Search

Search Space

Search Strategy

Performance Estimation Strategy

Updated on March, 2020

Meta-learning and Automated Machine Learning:Approaches and Applications

Speakers

Xin Wang Tsinghua University, China

Huaxiu Yao Pennsylvania State University, USA

Ying Wei City University of Hong Kong, China

Zhenhui (Jessie) Li Pennsylvania State University, USA

Wenwu Zhu Tsinghua University, China

Tutorial Description

Tutorial Outline

Target Audience and Prerequisites

Motivation, Relevance and Rationale

Tutorial Overview

Meta-learning

Learning to Learn

Learning to Transfer

Hyperparameter Optimization

Model-free Methods

Bayesian Optimization

Bandit-based Methods

Neural and Full Architecture Search

Search Space

Search Strategy

Performance Estimation Strategy

Updated on March, 2020

Meta-learning and Automated Machine Learning:
Approaches and Applications