Tutorial Description
Continuous and fast adaptation to novel tasks is one of the key characteristics of human intelligence, while it remains a daunting challenge for artificial intelligence. To solve this challenge, learning to learn (a.k.a., meta-learning) and automated machine learning (a.k.a, AutoML), as two hot topics in both academia and industry, both aim at adapting to different tasks and different data through leveraging various knowledge. Notably, they have achieved great success in diverse applications, such as image classification, question answering systems, and spatiotemporal prediction. In this tutorial, we will disseminate and promote the recent research achievements on meta-learning and AutoML as well as their applications, which is an exciting and fastgrowing research direction in the general field of machine learning. We will also advocate novel, high-quality research findings, and innovative solutions to the challenging problems in meta-learning and AutoML. This tutorial will consist of seven parts. We first give a brief introduction on the research and industrial motivation, followd by discussions on learning to learn and learning to transfer, two key problems in meta-learning. Then will turn to hyperparameter optimization, neural architecture search as well as their applications. We will also talk about some recent advances on full architecture search and auto graph learning. We finally share some of our insights on the trending for meta-learning and AutoML. All the slides and our experiences in winning the second place in NeuralIPS 2018 AutoML Competition and ACML 2019 AutoWSL Competition will be shared with all the audiences.
Tutorial Outline
The tutorial is scheduled for half-day and organized into 7 sections.
Target Audience and Prerequisites
This tutorial will be highly accessible to the whole AI community, including researchers, students and practitioners who are interested in AutoML, meta-learning and their applications in AI related tasks. The tutorial will be self-contained and designed for introductory and intermediate audiences. The audiences are supposed to have basic knowledge in machine learning, linear algebra, and calculus. In particular, audiences who have engaged in related topics (e.g., deep learning, reinforcement learning, transfer learning, multi-task learning, optimization) are welcome to have Q&A interaction during the tutorial.
Motivation, Relevance and Rationale
This tutorial is to disseminate and promote the recent research achievements on meta-learning and AutoML as well as its applications, which is an exciting and fast-growing research direction in the general field of machine learning. We will advocate novel, high-quality research findings, and innovative solutions to the challenging problems in meta-learning and AutoML. This topic is at the core of the scope of IJCAI, and is attractive to IJCAI audience from both academia and industry. The objective of "Motivate and explain a topic of emerging importance for AI" will be best served by this tutorial.
Tutorial Overview
We introduce the most recent updates and advances in meta-learning and AutoML during the past years. And all the slides and our experiences in winning the second place in NeuralIPS 2018 AutoML Competition and ACML 2019 AutoWSL Competition will be shared with all the audiences.
Meta-learning
Meta-learning refers to utilizing past experience from solving the related tasks to facilite the task being solved. In meta-learning, meta-data is collect to describe previous tasks and models. Then the meta-data is utilized to guide the search for optimal models for the new tasks. We will discuss meta-learning from two aspects: learning to learn and learning to transfer.
Learning to Learn
We discuss recent progresses of learning to learn (a.k.a., meta-learning). Here we present three influential categories of methods, including black-box amortized, gradient-based, and non-parametric meta-learning methods. Besides, we focus on how existing works tackle the challenging problem of task heterogeneity. Finally, various application examples of learning to learn are demonstrated.
Learning to Transfer
Besides learning to learn a typical task, an artificial intelligence agent has also been empowered to learn to quickly transfer knowledge from previous tasks via identifying what to transfer. The idea brings from the widely accepted principle that human beings improve transfer learning skills of deciding what to transfer through meta-cognitive reflection on inductive transfer learning practices. We will introduce the learning-to-transfer framework, the promising techniques and applications of learning to transfer.
Hyperparameter Optimization
Every machine learning system has hyperparameters. The choice of the hyperparameters significantly affects the effectiveness of the learning system. Especially in deep learning systems, there can be tens of thousands of hypermeters regarding neural architecture, regularization and optimization. Finding suitable hyperparameters often requires expert knowledge and sufficient experience in the ML system, which prohibits the feasibility and efficiency of ML system in real-world application fields. Hyperparameter Optimization (HPO) aims to automatedly select the optimal configurations of hyperparameters. It can reduce the human effert and improve the performance and reproducibility of the machine learning systems.
Model-free Methods
The most basic model-free HPO methods include grid search and random search. Population-based methods are another important branch of model-free HPO methods, which include genetic algorithms, evolutionary algorithms, etc. Model-free methods are usually simple for implementations and free from specific assumptions on the machine learning system being optimized.
Bayesian Optimization
Bayesian optimization (BO) is a state-of-the-art framework for optimizing blackbox functions. It includes a probabilistic model which is fitted to the observations, as well as an acquisition function that determines the candidate optimal hyperparameters by using the probabilistic model. Commonly used probabilistic models include Gaussian processes (GP) and the Tree Parzen Estimator (TPE). The most common choice of the acquisition function is the expected improvement (EI). BO forms the basis of several prominent AutoML frameworks, such as Auto-WEKA and Auto-sklearn.
Bandit-based Methods
In blackbox HPO methods such as BO, the performance evaluation can be very expensive, especially in large dataset sizes and complex models; each evaluation requires an entire running procedure of the machine learning system being optimized. To tackle this problem, bandit-based algorithms are proposed to cut off the less promising configurations early and focus on configurations that are expected to be optimal. Bandit-based HPO methods built on the assumption that, a small subset of data is sufficient to indicate the final performance of the candidate configurations. A simple yet powerful method is the successive halving (SH). HyperBand further consider different trade-off between the total budget and the number of candidate configurations to improve effectiveness of SH. BOHB combine Bayesian optimization and HyperBand to improve anytime performance and final performance simutaneously.
Neural and Full Architecture Search
Deep learning methods are very successful in solving tasks in machine translation, image and speech recognition or reinforcement learning in general. Neural Architecture Search (NAS), the process of automating architecture engineering, is important in automating machine learning. NAS can be seen as subfield of AutoML and has significant overlap with hyperparameter optimization and meta-learning. Targethe methods for NAS can be categorized according to three dimensions: search space, optimization methods, and performance estimation strategy. The NAS part of this tutorial is structured with respect to these three dimensions.
Search Space
The search space defines which architectures can be represented in principle. As the optimization problem of NAS is often non-continuous and high-dimensional, it can be largely simplified by a suitable choice of the search space, since the size of the search space can be reduced by incorporating prior knowledge on the specific task.
Search Strategy
The search strategy is about the exploration of the search space. There have been several different search strategies for the exploration of the search space, such as reinforcement learning, one-shot architecture search and evolutionary algorithms.
Performance Estimation Strategy
To guide the search process, Performance Estimation Strategy is used to estimate this performance of a given architecture. The simplest way is to perform a standard training and validation of the architecture on data. However, it can consume thousands of GPU days for NAS to train each architecture from scratch frequently, which is computationally expensive. Hence, many methods have been proposed to reduce the cost of performance estimations recently.