: _homo_cn:
图分类模型
构建图分类模块
在AutoGL中,我们支持两种图分类模型: gin
and topk
。
AutoGIN
图同构算子(Graph Isomorphism Operator)出自论文“How Powerful are Graph Neural Networks?”中,
图同构网络(Graph Isomorphism Network (GIN))是一种图神经网络,出自论文 “How Powerful are Graph Neural Networks” 。
层间更新方式为:
\[\mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot
\mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \right)\]
或者:
\[\mathbf{X}^{\prime} = h_{\mathbf{\Theta}} \left( \left( \mathbf{A} +
(1 + \epsilon) \cdot \mathbf{I} \right) \cdot \mathbf{X} \right),\]
这里 \(h_{\mathbf{\Theta}}\) 代表一个神经网络, 例如一个多层感知机(MLP).
参数包括:
- num_features: int - 特征的维度.
- num_classes: int - 类别的数量.
- device: torch.device or str - 用于运行模型的设备.
- init: bool - 如果设为True(False),模型会(不会)被初始化.
class AutoGIN(BaseModel):
r"""
AutoGIN. The model used in this automodel is GIN, i.e., the graph isomorphism network from the `"How Powerful are
Graph Neural Networks?" <https://arxiv.org/abs/1810.00826>`_ paper. The layer is
.. math::
\mathbf{x}^{\prime}_i = h_{\mathbf{\Theta}} \left( (1 + \epsilon) \cdot
\mathbf{x}_i + \sum_{j \in \mathcal{N}(i)} \mathbf{x}_j \right)
or
.. math::
\mathbf{X}^{\prime} = h_{\mathbf{\Theta}} \left( \left( \mathbf{A} +
(1 + \epsilon) \cdot \mathbf{I} \right) \cdot \mathbf{X} \right),
here :math:`h_{\mathbf{\Theta}}` denotes a neural network, *.i.e.* an MLP.
Parameters
----------
num_features: `int`.
The dimension of features.
num_classes: `int`.
The number of classes.
device: `torch.device` or `str`
The device where model will be running on.
init: `bool`.
If True(False), the model will (not) be initialized.
"""
def __init__(
self,
num_features=None,
num_classes=None,
device=None,
init=False,
num_graph_features=None,
**args
):
super(AutoGIN, self).__init__()
self.num_features = num_features if num_features is not None else 0
self.num_classes = int(num_classes) if num_classes is not None else 0
self.num_graph_features = (
int(num_graph_features) if num_graph_features is not None else 0
)
self.device = device if device is not None else "cpu"
self.params = {
"features_num": self.num_features,
"num_class": self.num_classes,
"num_graph_features": self.num_graph_features,
}
self.space = [
{
"parameterName": "num_layers",
"type": "DISCRETE",
"feasiblePoints": "4,5,6",
},
{
"parameterName": "hidden",
"type": "NUMERICAL_LIST",
"numericalType": "INTEGER",
"length": 5,
"minValue": [8, 8, 8, 8, 8],
"maxValue": [64, 64, 64, 64, 64],
"scalingType": "LOG",
"cutPara": ("num_layers",),
"cutFunc": lambda x: x[0] - 1,
},
{
"parameterName": "dropout",
"type": "DOUBLE",
"maxValue": 0.9,
"minValue": 0.1,
"scalingType": "LINEAR",
},
{
"parameterName": "act",
"type": "CATEGORICAL",
"feasiblePoints": ["leaky_relu", "relu", "elu", "tanh"],
},
{
"parameterName": "eps",
"type": "CATEGORICAL",
"feasiblePoints": ["True", "False"],
},
{
"parameterName": "mlp_layers",
"type": "DISCRETE",
"feasiblePoints": "2,3,4",
},
{
"parameterName": "neighbor_pooling_type",
"type": "CATEGORICAL",
"feasiblePoints": ["sum", "mean", "max"],
},
{
"parameterName": "graph_pooling_type",
"type": "CATEGORICAL",
"feasiblePoints": ["sum", "mean", "max"],
},
]
self.hyperparams = {
"num_layers": 5,
"hidden": [64,64,64,64],
"dropout": 0.5,
"act": "relu",
"eps": "False",
"mlp_layers": 2,
"neighbor_pooling_type": "sum",
"graph_pooling_type": "sum"
}
self.initialized = False
if init is True:
self.initialize()
GIN中的超参数:
- num_layers: int - GIN的层数。
- hidden: List[int] - 每个隐藏层的大小。
- dropout: float - 随机失活(Dropout)的概率。
- act: str - 激活函数的类型。
- eps: str - 是否在GIN层中训练参数 \(epsilon\) 。
- mlp_layers: int - GIN中的多层感知机(MLP)层数。
- neighbor_pooling_type: str - GIN中的池化(pooling)层类型。
- graph_pooling_type: str - GIN最后一层之后的图池化(graph pooling)类型。
You could get define your own gin
model by using from_hyper_parameter
function and specify the hyperpameryers.
你可以通过使用 from_hyper_parameter
函数定义你自己的 gin
模型,并对其指定超参数。
# pyg version
from autogl.module.model.pyg import AutoGIN
# from autogl.module.model.dgl import AutoGIN # dgl version
model = AutoGIN(
num_features=dataset.num_node_features,
num_classes=dataset.num_classes,
num_graph_features=0,
init=False
).from_hyper_parameter({
# hp from model
"num_layers": 5,
"hidden": [64,64,64,64],
"dropout": 0.5,
"act": "relu",
"eps": "False",
"mlp_layers": 2,
"neighbor_pooling_type": "sum",
"graph_pooling_type": "sum"
}).model
然后你可以对模型进行100次的训练:
import torch.nn.functional as F
# Define the loss optimizer.
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# Training
for epoch in range(100):
model.train()
for data in train_loader:
data = data.to(args.device)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, data.y)
loss.backward()
optimizer.step()
最后,你可以评估该模型:
def test(model, loader, args):
model.eval()
correct = 0
for data in loader:
data = data.to(args.device)
output = model(data)
pred = output.max(dim=1)[1]
correct += pred.eq(data.y).sum().item()
return correct / len(loader.dataset)
acc = test(model, test_loader, args)
图分类任务的自动搜索
在AutoGL中,我们还提供了一个高级的API求解器来控制整个流水线。我们将构建图神经网络模块部分的训练过程封装在求解器 AutoGraphClassifier
中以用于图分类任务,它支持自动超参数优化,特征工程及集成。
在这一部分,我们提供了一个例子来指导如何使用 AutoGraphClassifier
:
solver = AutoGraphClassifier(
feature_module=None,
graph_models=[args.model],
hpo_module='random',
ensemble_module=None,
device=args.device, max_evals=1,
trainer_hp_space = fixed(
**{
# hp from trainer
"max_epoch": args.epoch,
"batch_size": args.batch_size,
"early_stopping_round": args.epoch + 1,
"lr": args.lr,
"weight_decay": 0,
}
),
model_hp_spaces=[
fixed(**{
# hp from model
"num_layers": 5,
"hidden": [64,64,64,64],
"dropout": 0.5,
"act": "relu",
"eps": "False",
"mlp_layers": 2,
"neighbor_pooling_type": "sum",
"graph_pooling_type": "sum"
}) if args.model == 'gin' else fixed(**{
"ratio": 0.8,
"dropout": 0.5,
"act": "relu"
}),
]
)
# fit auto model
solver.fit(dataset, evaluation_method=['acc'])
# prediction
out = solver.predict(dataset, mask='test')