AutoGL Trainer
AutoGL project use trainer
to handle the auto-training of tasks. Currently, we support the following tasks:
NodeClassificationTrainer
for semi-supervised node classificationGraphClassificationTrainer
for supervised graph classificationLinkPredictionTrainer
for link prediction
Lazy Initialization
Similar reason to :ref:model, we also use lazy initialization for all trainers. Only (part of) the hyper-parameters will be set when __init__()
is called. The trainer
will have its core model
only after initialize()
is explicitly called, which will be done automatically in solver
and duplicate_from_hyper_parameter()
, after all the hyper-parameters are set properly.
Train and Predict
After initializing a trainer, you can train it on the given datasets.
We have given the training and testing functions for the tasks of node classification, graph classification, and link prediction up to now. You can also create your tasks following the similar patterns with ours. For training, you need to define train_only()
and use it in train()
. For testing, you need to define predict_proba()
and use it in predict()
.
The evaluation function is defined in evaluate()
, you can use your our evaluation metrics and methods.
Node Classification with Sampling
According to various present studies, training with spatial sampling has been demonstrated as an efficient technique for representation learning on large-scale graph. We provide implementations for various representative sampling mechanisms including Neighbor Sampling, Layer Dependent Importance Sampling (LADIES), and GraphSAINT. With the leverage of various efficient sampling mechanisms, users can utilize this library on large-scale graph dataset, e.g. Reddit.
Specifically, as various sampling techniques generally require model to support some layer-wise processing in forwarding, now only the provided GCN and GraphSAGE models are ready for Node-wise Sampling (Neighbor Sampling) and Layer-wise Sampling (LADIES). More models and more tasks are scheduled to support sampling in future version.
- Node-wise Sampling (GraphSAGE)
- Both
GCN
andGraphSAGE
models are supported.
- Layer-wise Sampling (Layer Dependent Importance Sampling)
- Only the
GCN
model is supported in current version.
- Subgraph-wise Sampling (GraphSAINT)
- As The GraphSAINT sampling technique have no specific requirements for model to adopt,
most of the available models are feasible for adopting GraphSAINT technique.
However, the prediction process is a potential bottleneck or even obstacle
when the GraphSAINT technique is actually applied on large-scale graph,
thus the the model to adopt is better to support layer-wise prediction,
and the provided
GCN
model already meet that enhanced requirement. According to empirical experiments, the implementation of GraphSAINT now has the leverage to support an integral graph smaller than the Flickr graph data.
The sampling techniques can be utilized by adopting corresponding trainer
NodeClassificationGraphSAINTTrainer
,
NodeClassificationLayerDependentImportanceSamplingTrainer
,
and NodeClassificationNeighborSamplingTrainer
.
You can either specify the corresponding name of trainer in YAML configuration file
or instantiate the solver AutoNodeClassifier
with the instance of specific trainer. However, please make sure to manange some key
hyper-paramters properly inside the hyper-parameter space. Specifically:
For NodeClassificationLayerDependentImportanceSamplingTrainer
, you need to set the
hyper-parameter sampled_node_sizes
properly. The space of sampled_node_sizes
should
be a list of the same size with your Sequential Model. For example, if you have a
model with layer number 4, you need to pass the hyper-parameter space properly:
solver = AutoNodeClassifier(
graph_models=(A_MODEL_WITH_4_LAYERS,),
default_trainer='NodeClassificationLayerDependentImportanceSamplingTrainer',
trainer_hp_space=[
# (required) you need to set the trainer_hp_space properly.
{
'parameterName': 'sampled_node_sizes',
'type': 'NUMERICAL_LIST',
"numericalType": "INTEGER",
"length": 4, # same with the layer number of your model
"minValue": [200,200,200,200],
"maxValue": [1000,1000,1000,1000],
"scalingType": "LOG"
},
...
]
)
If the layer number of your model is a searchable hyper-parameters, you can also set the cutPara
and cutFunc
properly, to make it connected with your layer number hyper-parameters of model.
'''
Suppose the layer number of your model is of the following forms:
{
'parameterName': 'layer_number',
'type': 'INTEGER',
'minValue': 2,
'maxValue': 4,
'scalingType': 'LOG'
}
'''
solver = AutoNodeClassifier(
graph_models=(A_MODEL_WITH_DYNAMIC_LAYERS,),
default_trainer='NodeClassificationLayerDependentImportanceSamplingTrainer',
trainer_hp_space=[
# (required) you need to set the trainer_hp_space properly.
{
'parameterName': 'sampled_node_sizes',
'type': 'NUMERICAL_LIST',
"numericalType": "INTEGER",
"length": 4, # max length
"cutPara": ("layer_number", ), # link with layer_number
"cutFunc": lambda x:x[0], # link with layer_number
"minValue": [200,200,200,200],
"maxValue": [1000,1000,1000,1000],
"scalingType": "LOG"
},
...
]
)
Similarly, if you want to use NodeClassificationNeighborSamplingTrainer
, you need to
make sure setting the hyper-parameter sampling_sizes
the same length as the layer number
of your model. For example:
'''
Suppose the layer number of your model is of the following forms:
{
'parameterName': 'layer_number',
'type': 'INTEGER',
'minValue': 2,
'maxValue': 4,
'scalingType': 'LOG'
}
'''
solver = AutoNodeClassifier(
graph_models=(A_MODEL_WITH_DYNAMIC_LAYERS,),
default_trainer='NodeClassificationNeighborSamplingTrainer',
trainer_hp_space=[
# (required) you need to set the trainer_hp_space properly.
{
'parameterName': 'sampling_sizes',
'type': 'NUMERICAL_LIST',
"numericalType": "INTEGER",
"length": 4, # max length
"cutPara": ("layer_number", ), # link with layer_number
"cutFunc": lambda x:x[0], # link with layer_number
"minValue": [20,20,20,20],
"maxValue": [100,100,100,100],
"scalingType": "LOG"
},
...
]
)
You can also pass a trainer inside model list directly. A brief example is demonstrated as follows:
ladies_sampling_trainer = NodeClassificationLayerDependentImportanceSamplingTrainer(
model='gcn', num_features=dataset.num_features, num_classes=dataset.num_classes, ...
)
ladies_sampling_trainer.hyper_parameter_space = [
# (required) you need to set the trainer_hp_space properly.
{
'parameterName': 'sampled_node_sizes',
'type': 'NUMERICAL_LIST',
"numericalType": "INTEGER",
"length": 4, # max length
"cutPara": ("num_layers", ), # link with layer_number
"cutFunc": lambda x:x[0], # link with layer_number
"minValue": [200,200,200,200],
"maxValue": [1000,1000,1000,1000],
"scalingType": "LOG"
},
...
]
AutoNodeClassifier(graph_models=(ladies_sampling_trainer,), ...)