Graph Classification Model
Building Graph Classification Modules
In AutoGL, we support two graph classification models, gin
and topk
.
AutoGIN
The graph isomorphism operator from the “How Powerful are Graph Neural Networks?” paper
Graph Isomorphism Network (GIN) is one graph classification model from “How Powerful are Graph Neural Networks” paper.
The layer is
or
here \(h_{\mathbf{\Theta}}\) denotes a neural network, .i.e. an MLP.
PARAMETERS: - num_features: int - The dimension of features.
- num_classes: int - The number of classes.
- device: torch.device or str - The device where model will be running on.
- init: bool - If True(False), the model will (not) be initialized.
class AutoGIN(BaseModel):
r"""
AutoGIN. The model used in this automodel is GIN, i.e., the graph isomorphism network from the `"How Powerful are
Graph Neural Networks?" <https://arxiv.org/abs/1810.00826>`_ paper. The layer is
.. math::
\mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot
\mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \right)
or
.. math::
\mathbf{X}^{\prime} = h_{\mathbf{\Theta}} \left( \left( \mathbf{A} +
(1 + \epsilon) \cdot \mathbf{I} \right) \cdot \mathbf{X} \right),
here :math:`h_{\mathbf{\Theta}}` denotes a neural network, *.i.e.* an MLP.
Parameters
----------
num_features: `int`.
The dimension of features.
num_classes: `int`.
The number of classes.
device: `torch.device` or `str`
The device where model will be running on.
init: `bool`.
If True(False), the model will (not) be initialized.
"""
def __init__(
self,
num_features=None,
num_classes=None,
device=None,
init=False,
num_graph_features=None,
**args
):
super(AutoGIN, self).__init__()
self.num_features = num_features if num_features is not None else 0
self.num_classes = int(num_classes) if num_classes is not None else 0
self.num_graph_features = (
int(num_graph_features) if num_graph_features is not None else 0
)
self.device = device if device is not None else "cpu"
self.params = {
"features_num": self.num_features,
"num_class": self.num_classes,
"num_graph_features": self.num_graph_features,
}
self.space = [
{
"parameterName": "num_layers",
"type": "DISCRETE",
"feasiblePoints": "4,5,6",
},
{
"parameterName": "hidden",
"type": "NUMERICAL_LIST",
"numericalType": "INTEGER",
"length": 5,
"minValue": [8, 8, 8, 8, 8],
"maxValue": [64, 64, 64, 64, 64],
"scalingType": "LOG",
"cutPara": ("num_layers",),
"cutFunc": lambda x: x[0] - 1,
},
{
"parameterName": "dropout",
"type": "DOUBLE",
"maxValue": 0.9,
"minValue": 0.1,
"scalingType": "LINEAR",
},
{
"parameterName": "act",
"type": "CATEGORICAL",
"feasiblePoints": ["leaky_relu", "relu", "elu", "tanh"],
},
{
"parameterName": "eps",
"type": "CATEGORICAL",
"feasiblePoints": ["True", "False"],
},
{
"parameterName": "mlp_layers",
"type": "DISCRETE",
"feasiblePoints": "2,3,4",
},
{
"parameterName": "neighbor_pooling_type",
"type": "CATEGORICAL",
"feasiblePoints": ["sum", "mean", "max"],
},
{
"parameterName": "graph_pooling_type",
"type": "CATEGORICAL",
"feasiblePoints": ["sum", "mean", "max"],
},
]
self.hyperparams = {
"num_layers": 5,
"hidden": [64,64,64,64],
"dropout": 0.5,
"act": "relu",
"eps": "False",
"mlp_layers": 2,
"neighbor_pooling_type": "sum",
"graph_pooling_type": "sum"
}
self.initialized = False
if init is True:
self.initialize()
Hyperparameters in GIN:
- num_layers: int - number of GIN layers.
- hidden: List[int] - hidden size for each hidden layer.
- dropout: float - dropout probability.
- act: str - type of activation function.
- eps: str - whether to train parameter \(epsilon\) in the GIN layer.
- mlp_layers: int - number of MLP layers in the GIN layer.
- neighbor_pooling_type: str - pooling type in the GIN layer.
- graph_pooling_type: str - graph pooling type following the last GIN layer.
You could get define your own gin
model by using from_hyper_parameter
function and specify the hyperpameryers.
# pyg version
from autogl.module.model.pyg import AutoGIN
# from autogl.module.model.dgl import AutoGIN # dgl version
model = AutoGIN(
num_features=dataset.num_node_features,
num_classes=dataset.num_classes,
num_graph_features=0,
init=False
).from_hyper_parameter({
# hp from model
"num_layers": 5,
"hidden": [64,64,64,64],
"dropout": 0.5,
"act": "relu",
"eps": "False",
"mlp_layers": 2,
"neighbor_pooling_type": "sum",
"graph_pooling_type": "sum"
}).model
Then you can train the model for 100 epochs.
import torch.nn.functional as F
# Define the loss optimizer.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Training
for epoch in range(100):
model.train()
for data in train_loader:
data = data.to(args.device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, data.y)
loss.backward()
optimizer.step()
Finally, evaluate the trained model.
def test(model, loader, args):
model.eval()
correct = 0
for data in loader:
data = data.to(args.device)
output = model(data)
pred = output.max(dim=1)[1]
correct += pred.eq(data.y).sum().item()
return correct / len(loader.dataset)
acc = test(model, test_loader, args)
Automatic Search for Graph Classification Tasks
In AutoGL, we also provide a high-level API Solver to control the overall pipeline.
We encapsulated the training process in the Building GNN Modules part for graph classification tasks
in the solver AutoGraphClassifier
that supports automatic hyperparametric optimization
as well as feature engineering and ensemble. In this part, we will show you how to use
AutoGraphClassifier
.
solver = AutoGraphClassifier(
feature_module=None,
graph_models=[args.model],
hpo_module='random',
ensemble_module=None,
device=args.device, max_evals=1,
trainer_hp_space = fixed(
**{
# hp from trainer
"max_epoch": args.epoch,
"batch_size": args.batch_size,
"early_stopping_round": args.epoch + 1,
"lr": args.lr,
"weight_decay": 0,
}
),
model_hp_spaces=[
fixed(**{
# hp from model
"num_layers": 5,
"hidden": [64,64,64,64],
"dropout": 0.5,
"act": "relu",
"eps": "False",
"mlp_layers": 2,
"neighbor_pooling_type": "sum",
"graph_pooling_type": "sum"
}) if args.model == 'gin' else fixed(**{
"ratio": 0.8,
"dropout": 0.5,
"act": "relu"
}),
]
)
# fit auto model
solver.fit(dataset, evaluation_method=['acc'])
# prediction
out = solver.predict(dataset, mask='test')