Quick Start

This tutorial will help you quickly go through the concepts and usages of important classes in AutoGL. In this tutorial, you will conduct a quick auto graph learning on dataset Cora.

AutoGL Learning

Based on the concept of autoML, auto graph learning aims at automatically solve tasks with data represented by graphs. Unlike conventional learning frameworks, auto graph learning, like autoML, does not need humans inside the experiment loop. You only need to provide the datasets and tasks to the AutoGL solver. This framework will automatically find suitable methods and hyperparameters for you.

The diagram below describes the workflow of AutoGL framework.

To reach the aim of autoML, our proposed auto graph learning framework is organized as follows. We have dataset to maintain the graph datasets given by users. A solver object needs to be built for specifying the target tasks. Inside solver, there are five submodules to help complete the auto graph tasks, namely auto feature engineer, auto model, neural architecture search, hyperparameter optimization and auto ensemble, which will automatically preprocess/enhance your data, choose and optimize deep models and ensemble them in the best way for you.

Let’s say you want to conduct an auto graph learning on dataset Cora. First, you can easily get the Cora dataset using the dataset module:

from autogl.datasets import build_dataset_from_name
cora_dataset = build_dataset_from_name('cora')

The dataset will be automatically downloaded for you. Please refer to AutoGL Dataset or autogl.datasets for more details of dataset constructions, available datasets, add local datasets, etc.

After deriving the dataset, you can build a node classification solver to handle auto training process:

import torch
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
from autogl.solver import AutoNodeClassifier
solver = AutoNodeClassifier(
    feature_module='deepgl',
    graph_models=['gcn', 'gat'],
    hpo_module='anneal',
    ensemble_module='voting',
    device=device
)

In this way, we build a node classification solver, which will use deepgl as its feature engineer, and use anneal hyperparameter optimizer to optimize the given three models ['gcn','gat']. The derived models will then be ensembled using voting ensembler. Please refer to the corresponding tutorials or documentation to see the definition and usages of available submodules.

Then, you can fit the solver and then check the leaderboard:

solver.fit(cora_dataset, time_limit=3600)
solver.get_leaderboard().show()

The time_limit is set to 3600 so that the whole auto graph process will not exceed 1 hour. solver.show() will present the models maintained by solver, with their performances on the validation dataset.

Then, you can make the predictions and evaluate the results using the evaluation functions provided:

from autogl.module.train import Acc
predicted = solver.predict_proba()
print('Test accuracy: ', Acc.evaluate(predicted,
    cora_dataset.data.y[cora_dataset.data.test_mask].cpu().numpy()))

Note

You don’t need to pass the cora_dataset again when predicting, since the dataset is remembered by the solver and will be reused when no dataset is passed at predicting. However, you can also pass a new dataset when predicting, and the new dataset will be used instead of the remembered one. Please refer to AutoGL Solver or autogl.solver for more details.