Automated Machine Learning on Graph Tutorial

Tutorial Description

Machine learning on graphs has been extensively studied in both academic and industry. However, as the literature on graph learning booms with a vast number of emerging methods and techniques, it becomes increasingly difficult to manually design the optimal machine learning algorithm for different graph-related tasks. To solve this critical challenge, automated machine learning (AutoML) on graphs which combines the strength of graph machine learning and AutoML together, is gaining attentions from the research community. In this tutorial, we discuss AutoML on graphs, primarily focusing on hyper-parameter optimization (HPO) and neural architecture search (NAS) for graph machine learning. We further overview libraries related to automated graph machine learning and in depth discuss AutoGL, the first dedicated open-source library for AutoML on graphs. In the end, we share our insights on future research directions for automated graph machine learning. To the best of our knowledge, this tutorial is the first to systematically and comprehensively review automated machine learning on graphs, possessing a great potential to draw a large amount of interests in the community.

Tutorial Outline

To the best of our knowledge, this tutorial is the first to systematically and comprehensively discuss automated machine learning on graphs, with a great potential to draw a large amount of interests in the community. The tutorial is planned for half-day and organized into 6 sections.

The research and industrial motivation

Machine learning on graph and automated machine learning

Hyperparameter optimization for graph machine learning

Neural architecture search for graph machine learning

Automated graph machine learning libraries

Discussions and future directions

Target Audience and Prerequisites

This tutorial will be highly accessible to the whole datamining community, including researchers, students and practitioners who are interested in Automated machine learning and their applications in graph related tasks. The tutorial will be self-contained and designed for introductory and intermediate audiences. No special prerequisite knowledge is required to attend this tutorial.

Motivation, Relevance and Rationale

This tutorial is to disseminate and promote the recent research achievements over automated machine learning on graph, which is an exciting and fast-growing research direction in the general field of machine learning and datamining. We will advocate novel, high-quality research findings, and innovative solutions to the challenging problems in automated machine learning on graph. This topic is at the core of the scope of KDD, and is attractive to KDD audience from both academia and industry.

Tutorial Overview

Automated machine learning on graphs, which non-trivially combines the strength of AutoML and graph machine learning, faces the following challenges.

The uniqueness of graph machine learning: Unlike audio, image, or text, which has a grid structure, graph data lies in a non-Euclidean space. Thus, graph machine learning usually has unique architectures and designs. For example, typical NAS methods focus on the search space for convolution and recurrent operations, which is distinct from the building blocks of GNNs.

Complexity and diversity of graph tasks: As aforementioned, graph tasks per se are complex and diverse, ranging from node-level to graph-level problems, and with different settings, objectives, and constraints. How to impose proper inductive bias and integrate domain knowledge into a graph AutoML method is indispensable.

Scalability: Many real graphs such as social networks or the Web are incredibly large-scale with billions of nodes and edges. Besides, the nodes in the graph are interconnected and cannot be treated as independent samples. Designing scalable AutoML algorithms for graphs poses significant challenges since both graph machine learning and AutoML are already notorious for being compute-intensive.

Approaches with HPO or NAS for graph machine learning discussed in this tutorial will target at handling at least one of these three challenges.

Machine Learning on Graph and Automated Machine Learning

Both machine learning on graph and automated machine learning (AutoML) have been well studied in the past decade, and we will briefly introduce some necessary basics to make our tutorial self-contained in this part. To be more specific, graph machine learning will cover definitions and problem formulations over graph problems such as node classification, link prediction, graph classification etc. in the context of machine learning. Automated machine learning (AutoML) will cover hyperparameter optimization (HPO) and neural architecture search (NAS), where HPO contains model-free methods, Bayesian optimization and bandit-based methods, and NAS contains investigations on search space, search strategy and performance estimation strategy.

Hyperparameter Optimization for Graph Machine Learning

In this part, we plan to discuss HPO for machine learning on graphs. The main challenge here is scalability, i.e., a real graph can have billions of nodes and edges, and each trial on the graph is computationally expensive. Next, we elaborate on how different methods tackle the efficiency challenge. We will omit some straightforward HPO methods such as random search and grid search since they are applied to graphs without any modification.

We will cover:

AutoNE, which is the first HPO method specially designed to tackle the efficiency problem of graphs.

JITuNE, which proposes to replace the sampling process of AutoNE with graph coarsening to generate a hierarchical graph synopsis.

e-AutoGR, which explores explainable automated graph representation learning through explaining hyperparameter importance.

HESGA, which proposes another strategy to improve efficiency using a hierarchical evaluation strategy together with evolutionary algorithms.

AutoGM, which focuses on proposing a unified framework for various graph machine learning algorithms.

Neural Architecture Search for Graph Machine Learning

NAS methods can be compared in three aspects: search space, search strategy, and performance estimation strategy. In this part, we plan to discuss NAS methods for graph machine learning from these three aspects and together with some designs uniquely for graphs. We mainly review NAS for GNNs, which is the focus of the literature.

Search Space will cover micro search space, macro search space, pooling methods and hyperparameters.

Search Strategy will cover controller + RL, differentiable approaches, evolutionary algorithms and hybrid methods.

Performance Estimation Strategy will cover reducing fidelity, weights sharing etc.

Automated Graph Machine Learning Libraries

Publicly available libraries are important to facilitate and advance the research and applications of AutoML on graphs. On the one hand, popular libraries for graph machine learning include PyTorch Geometric, Deep Graph Library, GraphNets, AliGraph, Euler, PBG, Stellar Graph, Spektral, CodDL, OpenNE, GEM, Karateclub and classical NetworkX. However, these libraries do not support AutoML. On the other hand, AutoML libraries such as NNI, AutoKeras, AutoSklearn, Hyperopt, TPOT, AutoGluon, MLBox and MLJAR are widely adopted. Unfortunately, because of the uniqueness and complexity of graph tasks, they cannot be directly applied to automate graph machine learning. Recently, AutoGL (homepage: https://mn.cs.tsinghua.edu.cn/AutoGL), the first dedicated library for automated graph learning, is developed.

The main characteristics of AutoGL are three-folded:

Open-source: all the source codes of AutoGL are publicly available at https://github.com/THUMNLab/AutoGL under the MIT license.

Easy to use: AutoGL is designed to be easy to use. For example, less than 10 lines of codes are needed to conduct some quick experiments of AutoGL.

Flexible to be extended: AutoGL adopts a modular design with high-level base classes API and extensive documentations, which allows flexible and customized extensions.

We will introduce the featured characteristics of AutoGL, the first library for automated graph machine learning in this part.

Automated Machine Learning on Graph

Speakers

Xin Wang Tsinghua University, China

Ziwei Zhang Tsinghua University, China

Wenwu Zhu Tsinghua University, China