Automated Machine Learning on Graph

KDD 2021, Singapore

Speakers

Xin Wang Tsinghua University, China

Xin Wang is currently an Assistant Professor at the Department of Computer Science and Technology, Tsinghua University. He got both of his Ph.D. and B.E degrees in Computer Science and Technology from Zhejiang University, China. He also holds a second Ph.D. degree in Computing Science from Simon Fraser University, Canada. His research interests include cross-modal multimedia intelligence and recommendation in social media. He has published several high-quality research papers in top conferences including ICML, KDD, WWW, SIGIR, AAAI, IJCAI, CIKM etc.

Ziwei Zhang Tsinghua University, China

Ziwei Zhang is currently a Ph.D. candidate in the Department of Computer Science and Technology, Tsinghua University. He received his B.S. from the Department of Physics, Tsinghua University, in 2016. His research interests focus on network embedding (a.k.a. network representation learning) and machine learning on graph data, especially developing scalable algorithms for large-scale networks. He has published several papers in prestigious conferences and journals, including KDD, AAAI, IJCAI, and TKDE.

Wenwu Zhu Tsinghua University, China

Wenwu Zhu is currently a Professor and the Vice Chair of the Department of Computer Science and Technology at Tsinghua University, the Vice Dean of National Research Center for Information Science and Technology, and the Vice Director of Tsinghua Center for Big Data. Prior to his current post, he was a Senior Researcher and Research Manager at Microsoft Research Asia. He was the Chief Scientist and Director at Intel Research China from 2004 to 2008. He worked at Bell Labs New Jersey as Member of Technical Staff during 1996-1999. He received his Ph.D. degree from New York University in 1996 in Electrical and Computer Engineering.

Wenwu Zhu is an AAAS Fellow, IEEE Fellow, SPIE Fellow, and a member of The Academy of Europe (Academia Europaea). He has published over 300 referred papers in the areas of multimedia computing, communications and networking, and big data. He is inventor or co-inventor of over 50 patents. He received eight Best Paper Awards, including ACM Multimedia 2012 and IEEE Transactions on Circuits and Systems for Video Technology in 2001. His current research interests are in the area of Cyber-Physical-Human big data computing, and Cross-media big data and intelligence.

Wenwu Zhu served as EiC for IEEE Transactions for Multimedia from January 2017 to December 2019. He also served as Guest Editors for the Proceedings of the IEEE, IEEE Journal on Selected Areas in Communications, ACM Transactions on Intelligent Systems and Technology, etc.; and Associate Editors for IEEE Transactions on Mobile Computing, ACM Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, and IEEE Transactions on Big Data, etc. He served in the steering committee for IEEE Transactions on Multimedia (2015-2016) and IEEE Transactions on Mobile Computing (2007-2010), respectively. He served as TPC Co-chair for ACM Multimedia 2014 and IEEE ISCAS 2013, respectively. He serves as General Co-Chair for ACM Multimedia 2018 and ACM CIKM 2019, respectively.

Tutorial Description

Machine learning on graphs has been extensively studied in both academic and industry. However, as the literature on graph learning booms with a vast number of emerging methods and techniques, it becomes increasingly difficult to manually design the optimal machine learning algorithm for different graph-related tasks. To solve this critical challenge, automated machine learning (AutoML) on graphs which combines the strength of graph machine learning and AutoML together, is gaining attentions from the research community. In this tutorial, we discuss AutoML on graphs, primarily focusing on hyper-parameter optimization (HPO) and neural architecture search (NAS) for graph machine learning. We further overview libraries related to automated graph machine learning and in depth discuss AutoGL, the first dedicated open-source library for AutoML on graphs. In the end, we share our insights on future research directions for automated graph machine learning. To the best of our knowledge, this tutorial is the first to systematically and comprehensively review automated machine learning on graphs, possessing a great potential to draw a large amount of interests in the community.


Tutorial Outline

To the best of our knowledge, this tutorial is the first to systematically and comprehensively discuss automated machine learning on graphs, with a great potential to draw a large amount of interests in the community. The tutorial is planned for half-day and organized into 6 sections.

  • The research and industrial motivation
  • Machine learning on graph and automated machine learning
  • Hyperparameter optimization for graph machine learning
  • Neural architecture search for graph machine learning
  • Automated graph machine learning libraries
  • Discussions and future directions

  • Target Audience and Prerequisites

    This tutorial will be highly accessible to the whole datamining community, including researchers, students and practitioners who are interested in Automated machine learning and their applications in graph related tasks. The tutorial will be self-contained and designed for introductory and intermediate audiences. No special prerequisite knowledge is required to attend this tutorial.


    Motivation, Relevance and Rationale

    This tutorial is to disseminate and promote the recent research achievements over automated machine learning on graph, which is an exciting and fast-growing research direction in the general field of machine learning and datamining. We will advocate novel, high-quality research findings, and innovative solutions to the challenging problems in automated machine learning on graph. This topic is at the core of the scope of KDD, and is attractive to KDD audience from both academia and industry.


    Tutorial Overview

    Automated machine learning on graphs, which non-trivially combines the strength of AutoML and graph machine learning, faces the following challenges.

  • The uniqueness of graph machine learning: Unlike audio, image, or text, which has a grid structure, graph data lies in a non-Euclidean space. Thus, graph machine learning usually has unique architectures and designs. For example, typical NAS methods focus on the search space for convolution and recurrent operations, which is distinct from the building blocks of GNNs.
  • Complexity and diversity of graph tasks: As aforementioned, graph tasks per se are complex and diverse, ranging from node-level to graph-level problems, and with different settings, objectives, and constraints. How to impose proper inductive bias and integrate domain knowledge into a graph AutoML method is indispensable.
  • Scalability: Many real graphs such as social networks or the Web are incredibly large-scale with billions of nodes and edges. Besides, the nodes in the graph are interconnected and cannot be treated as independent samples. Designing scalable AutoML algorithms for graphs poses significant challenges since both graph machine learning and AutoML are already notorious for being compute-intensive.
  • Approaches with HPO or NAS for graph machine learning discussed in this tutorial will target at handling at least one of these three challenges.

    Machine Learning on Graph and Automated Machine Learning

    Both machine learning on graph and automated machine learning (AutoML) have been well studied in the past decade, and we will briefly introduce some necessary basics to make our tutorial self-contained in this part. To be more specific, graph machine learning will cover definitions and problem formulations over graph problems such as node classification, link prediction, graph classification etc. in the context of machine learning. Automated machine learning (AutoML) will cover hyperparameter optimization (HPO) and neural architecture search (NAS), where HPO contains model-free methods, Bayesian optimization and bandit-based methods, and NAS contains investigations on search space, search strategy and performance estimation strategy.

    Hyperparameter Optimization for Graph Machine Learning

    In this part, we plan to discuss HPO for machine learning on graphs. The main challenge here is scalability, i.e., a real graph can have billions of nodes and edges, and each trial on the graph is computationally expensive. Next, we elaborate on how different methods tackle the efficiency challenge. We will omit some straightforward HPO methods such as random search and grid search since they are applied to graphs without any modification.

    We will cover:
  • AutoNE, which is the first HPO method specially designed to tackle the efficiency problem of graphs.
  • JITuNE, which proposes to replace the sampling process of AutoNE with graph coarsening to generate a hierarchical graph synopsis.
  • e-AutoGR, which explores explainable automated graph representation learning through explaining hyperparameter importance.
  • HESGA, which proposes another strategy to improve efficiency using a hierarchical evaluation strategy together with evolutionary algorithms.
  • AutoGM, which focuses on proposing a unified framework for various graph machine learning algorithms.
  • Neural Architecture Search for Graph Machine Learning

    NAS methods can be compared in three aspects: search space, search strategy, and performance estimation strategy. In this part, we plan to discuss NAS methods for graph machine learning from these three aspects and together with some designs uniquely for graphs. We mainly review NAS for GNNs, which is the focus of the literature.

  • Search Space will cover micro search space, macro search space, pooling methods and hyperparameters.
  • Search Strategy will cover controller + RL, differentiable approaches, evolutionary algorithms and hybrid methods.
  • Performance Estimation Strategy will cover reducing fidelity, weights sharing etc.
  • Automated Graph Machine Learning Libraries

    Publicly available libraries are important to facilitate and advance the research and applications of AutoML on graphs. On the one hand, popular libraries for graph machine learning include PyTorch Geometric, Deep Graph Library, GraphNets, AliGraph, Euler, PBG, Stellar Graph, Spektral, CodDL, OpenNE, GEM, Karateclub and classical NetworkX. However, these libraries do not support AutoML. On the other hand, AutoML libraries such as NNI, AutoKeras, AutoSklearn, Hyperopt, TPOT, AutoGluon, MLBox and MLJAR are widely adopted. Unfortunately, because of the uniqueness and complexity of graph tasks, they cannot be directly applied to automate graph machine learning. Recently, AutoGL (homepage: https://mn.cs.tsinghua.edu.cn/AutoGL), the first dedicated library for automated graph learning, is developed.

    The main characteristics of AutoGL are three-folded:

  • Open-source: all the source codes of AutoGL are publicly available at https://github.com/THUMNLab/AutoGL under the MIT license.
  • Easy to use: AutoGL is designed to be easy to use. For example, less than 10 lines of codes are needed to conduct some quick experiments of AutoGL.
  • Flexible to be extended: AutoGL adopts a modular design with high-level base classes API and extensive documentations, which allows flexible and customized extensions.
  • We will introduce the featured characteristics of AutoGL, the first library for automated graph machine learning in this part.