Graph Classification Model

Building Graph Classification Modules

In AutoGL, we support two graph classification models, gin and topk.

AutoGIN

The graph isomorphism operator from the “How Powerful are Graph Neural Networks?” paper

Graph Isomorphism Network (GIN) is one graph classification model from “How Powerful are Graph Neural Networks” paper.

The layer is

\[\mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \right)\]

or

\[\mathbf{X}^{\prime} = h_{\mathbf{\Theta}} \left( \left( \mathbf{A} + (1 + \epsilon) \cdot \mathbf{I} \right) \cdot \mathbf{X} \right),\]

here \(h_{\mathbf{\Theta}}\) denotes a neural network, .i.e. an MLP.

PARAMETERS: - num_features: int - The dimension of features.

num_classes: int - The number of classes.
device: torch.device or str - The device where model will be running on.
init: bool - If True(False), the model will (not) be initialized.

class AutoGIN(BaseModel):
    r"""
    AutoGIN. The model used in this automodel is GIN, i.e., the graph isomorphism network from the `"How Powerful are
    Graph Neural Networks?" <https://arxiv.org/abs/1810.00826>`_ paper. The layer is

    .. math::
        \mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot
        \mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \right)

    or

    .. math::
        \mathbf{X}^{\prime} = h_{\mathbf{\Theta}} \left( \left( \mathbf{A} +
        (1 + \epsilon) \cdot \mathbf{I} \right) \cdot \mathbf{X} \right),

    here :math:`h_{\mathbf{\Theta}}` denotes a neural network, *.i.e.* an MLP.

    Parameters
    ----------
    num_features: `int`.
        The dimension of features.

    num_classes: `int`.
        The number of classes.

    device: `torch.device` or `str`
        The device where model will be running on.

    init: `bool`.
        If True(False), the model will (not) be initialized.
    """

    def __init__(
        self,
        num_features=None,
        num_classes=None,
        device=None,
        init=False,
        num_graph_features=None,
        **args
    ):

        super(AutoGIN, self).__init__()
        self.num_features = num_features if num_features is not None else 0
        self.num_classes = int(num_classes) if num_classes is not None else 0
        self.num_graph_features = (
            int(num_graph_features) if num_graph_features is not None else 0
        )
        self.device = device if device is not None else "cpu"

        self.params = {
            "features_num": self.num_features,
            "num_class": self.num_classes,
            "num_graph_features": self.num_graph_features,
        }
        self.space = [
            {
                "parameterName": "num_layers",
                "type": "DISCRETE",
                "feasiblePoints": "4,5,6",
            },
            {
                "parameterName": "hidden",
                "type": "NUMERICAL_LIST",
                "numericalType": "INTEGER",
                "length": 5,
                "minValue": [8, 8, 8, 8, 8],
                "maxValue": [64, 64, 64, 64, 64],
                "scalingType": "LOG",
                "cutPara": ("num_layers",),
                "cutFunc": lambda x: x[0] - 1,
            },
            {
                "parameterName": "dropout",
                "type": "DOUBLE",
                "maxValue": 0.9,
                "minValue": 0.1,
                "scalingType": "LINEAR",
            },
            {
                "parameterName": "act",
                "type": "CATEGORICAL",
                "feasiblePoints": ["leaky_relu", "relu", "elu", "tanh"],
            },
            {
                "parameterName": "eps",
                "type": "CATEGORICAL",
                "feasiblePoints": ["True", "False"],
            },
            {
                "parameterName": "mlp_layers",
                "type": "DISCRETE",
                "feasiblePoints": "2,3,4",
            },
            {
                "parameterName": "neighbor_pooling_type",
                "type": "CATEGORICAL",
                "feasiblePoints": ["sum", "mean", "max"],
            },
            {
                "parameterName": "graph_pooling_type",
                "type": "CATEGORICAL",
                "feasiblePoints": ["sum", "mean", "max"],
            },
        ]

        self.hyperparams = {
            "num_layers": 5,
            "hidden": [64,64,64,64],
            "dropout": 0.5,
            "act": "relu",
            "eps": "False",
            "mlp_layers": 2,
            "neighbor_pooling_type": "sum",
            "graph_pooling_type": "sum"
        }

        self.initialized = False
        if init is True:
            self.initialize()

Hyperparameters in GIN:

num_layers: int - number of GIN layers.
hidden: List[int] - hidden size for each hidden layer.
dropout: float - dropout probability.
act: str - type of activation function.
eps: str - whether to train parameter \(epsilon\) in the GIN layer.
mlp_layers: int - number of MLP layers in the GIN layer.
neighbor_pooling_type: str - pooling type in the GIN layer.
graph_pooling_type: str - graph pooling type following the last GIN layer.

You could get define your own gin model by using from_hyper_parameter function and specify the hyperpameryers.

# pyg version
from autogl.module.model.pyg import AutoGIN
# from autogl.module.model.dgl import AutoGIN  # dgl version
model = AutoGIN(
                num_features=dataset.num_node_features,
                num_classes=dataset.num_classes,
                num_graph_features=0,
                init=False
            ).from_hyper_parameter({
                # hp from model
                "num_layers": 5,
                "hidden": [64,64,64,64],
                "dropout": 0.5,
                "act": "relu",
                "eps": "False",
                "mlp_layers": 2,
                "neighbor_pooling_type": "sum",
                "graph_pooling_type": "sum"
            }).model

Then you can train the model for 100 epochs.

import torch.nn.functional as F

# Define the loss optimizer.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

# Training
for epoch in range(100):
    model.train()
    for data in train_loader:
        data = data.to(args.device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, data.y)
        loss.backward()
        optimizer.step()

Finally, evaluate the trained model.

def test(model, loader, args):
    model.eval()

    correct = 0
    for data in loader:
        data = data.to(args.device)
        output = model(data)
        pred = output.max(dim=1)[1]
        correct += pred.eq(data.y).sum().item()
    return correct / len(loader.dataset)

acc = test(model, test_loader, args)

Automatic Search for Graph Classification Tasks

In AutoGL, we also provide a high-level API Solver to control the overall pipeline. We encapsulated the training process in the Building GNN Modules part for graph classification tasks in the solver AutoGraphClassifier that supports automatic hyperparametric optimization as well as feature engineering and ensemble. In this part, we will show you how to use AutoGraphClassifier.

solver = AutoGraphClassifier(
            feature_module=None,
            graph_models=[args.model],
            hpo_module='random',
            ensemble_module=None,
            device=args.device, max_evals=1,
            trainer_hp_space = fixed(
                **{
                    # hp from trainer
                    "max_epoch": args.epoch,
                    "batch_size": args.batch_size,
                    "early_stopping_round": args.epoch + 1,
                    "lr": args.lr,
                    "weight_decay": 0,
                }
            ),
            model_hp_spaces=[
                fixed(**{
                    # hp from model
                    "num_layers": 5,
                    "hidden": [64,64,64,64],
                    "dropout": 0.5,
                    "act": "relu",
                    "eps": "False",
                    "mlp_layers": 2,
                    "neighbor_pooling_type": "sum",
                    "graph_pooling_type": "sum"
                }) if args.model == 'gin' else fixed(**{
                    "ratio": 0.8,
                    "dropout": 0.5,
                    "act": "relu"
                }),
            ]
        )

# fit auto model
solver.fit(dataset, evaluation_method=['acc'])
# prediction
out = solver.predict(dataset, mask='test')