autogl.solver
Auto solver for various graph tasks
-
class
autogl.solver.
AutoNodeClassifier
(feature_module=None, graph_models=('gat', 'gcn'), nas_algorithms=None, nas_spaces=None, nas_estimators=None, hpo_module='anneal', ensemble_module='voting', max_evals=50, default_trainer='NodeClassificationFull', trainer_hp_space=None, model_hp_spaces=None, size=4, device='auto')[source] Auto Multi-class Graph Node Classifier.
Used to automatically solve the node classification problems.
Parameters: - feature_module (autogl.module.feature.BaseFeatureEngineer or str or None) – The (name of) auto feature engineer used to process the given dataset. Default
deepgl
. Disable feature engineer by setting it toNone
. - graph_models (Sequence of models) – Models can be
str
,autogl.module.model.BaseAutoModel
,autogl.module.model.encoders.BaseEncoderMaintainer
or a tuple of (encoder, decoder) if need to specify both encoder and decoder. Encoder can bestr
orautogl.module.model.encoders.BaseEncoderMaintainer
, and decoder can bestr
orautogl.module.model.decoders.BaseDecoderMaintainer
. - nas_algorithms ((list of) autogl.module.nas.algorithm.BaseNAS or str (Optional)) – The (name of) nas algorithms used. Default
None
. - nas_spaces ((list of) autogl.module.nas.space.BaseSpace or str (Optional)) – The (name of) nas spaces used. Default
None
. - nas_estimators ((list of) autogl.module.nas.estimator.BaseEstimator or str (Optional)) – The (name of) nas estimators used. Default
None
. - hpo_module (autogl.module.hpo.BaseHPOptimizer or str or None) – The (name of) hpo module used to search for best hyper parameters. Default
anneal
. Disable hpo by setting it toNone
. - ensemble_module (autogl.module.ensemble.BaseEnsembler or str or None) – The (name of) ensemble module used to ensemble the multi-models found. Default
voting
. Disable ensemble by setting it toNone
. - max_evals (int (Optional)) – If given, will set the number eval times the hpo module will use.
Only be effective when hpo_module is
str
. DefaultNone
. - default_trainer (str (Optional)) – The (name of) the trainer used in this solver. Default to
NodeClassificationFull
. - trainer_hp_space (list of dict (Optional)) – trainer hp space or list of trainer hp spaces configuration.
If a single trainer hp is given, will specify the hp space of trainer for every model.
If a list of trainer hp is given, will specify every model with corrsponding
trainer hp space.
Default
None
. - model_hp_spaces (list of list of dict (Optional)) – model hp space configuration.
If given, will specify every hp space of every passed model. Default
None
. If the encoder(-decoder) is passed, the space should be a dict containing keys “encoder” and “decoder”, specifying the detailed encoder decoder hp spaces. - size (int (Optional)) – The max models ensemble module will use. Default
None
. - device (torch.device or str) – The device where model will be running on. If set to
auto
, will use gpu when available. You can also specify the device by directly givinggpu
orcuda:0
, etc. Defaultauto
.
-
evaluate
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test', label=None, metric='acc')[source] Evaluate the given dataset.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
. - label (torch.Tensor (Optional)) – The groud truth label of the given predicted dataset split. If not passed, will extract labels from the input dataset.
- metric (str) – The metric to be used for evaluating the model. Default
acc
.
Returns: score(s) – the evaluation results according to the evaluator passed.
Return type: (list of) evaluation scores
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
-
fit
(dataset, time_limit=-1, inplace=False, train_split=None, val_split=None, balanced=True, evaluation_method='infer', seed=None) → autogl.solver.classifier.node_classifier.AutoNodeClassifier[source] Fit current solver on given dataset.
Parameters: - dataset (autogl.data.Dataset) – The dataset needed to fit on. This dataset must have only one graph.
- time_limit (int) – The time limit of the whole fit process (in seconds). If set below 0,
will ignore time limit. Default
-1
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - train_split (float or int (Optional)) – The train ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - val_split (float or int (Optional)) – The validation ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - balanced (bool) – Wether to create the train/valid/test split in a balanced way.
If set to
True
, the train/valid will have the same number of different classes. DefaultTrue
. - evaluation_method ((list of) str or autogl.module.train.evaluation) – A (list of) evaluation method for current solver. If
infer
, will automatically determine. Defaultinfer
. - seed (int (Optional)) – The random seed. If set to
None
, will run everything at random. DefaultNone
.
Returns: self – A reference of current solver.
Return type:
-
fit_predict
(dataset, time_limit=-1, inplace=False, train_split=None, val_split=None, balanced=True, evaluation_method='infer', use_ensemble=True, use_best=True, name=None) → numpy.ndarray[source] Fit current solver on given dataset and return the predicted value.
Parameters: - dataset (torch_geometric.data.dataset.Dataset) – The dataset needed to fit on. This dataset must have only one graph.
- time_limit (int) – The time limit of the whole fit process (in seconds).
If set below 0, will ignore time limit. Default
-1
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - train_split (float or int (Optional)) – The train ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - val_split (float or int (Optional)) – The validation ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - balanced (bool) – Wether to create the train/valid/test split in a balanced way.
If set to
True
, the train/valid will have the same number of different classes. DefaultFalse
. - evaluation_method ((list of) str or autogl.module.train.evaluation) – A (list of) evaluation method for current solver. If
infer
, will automatically determine. Defaultinfer
. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
.
Returns: result – An array of shape
(N,)
, whereN
is the number of test nodes. The prediction on given dataset.Return type: np.ndarray
-
classmethod
from_config
(path_or_dict, filetype='auto') → autogl.solver.classifier.node_classifier.AutoNodeClassifier[source] Load solver from config file.
You can use this function to directly load a solver from predefined config dict or config file path. Currently, only support file type of
json
oryaml
, if you pass a path.Parameters: - path_or_dict (str or dict) – The path to the config file or the config dictionary object
- filetype (str) – The filetype the given file if the path is specified. Currently only support
json
oryaml
. You can set toauto
to automatically detect the file type (from file name). Defaultauto
.
Returns: solver – The solver that is created from given file or dictionary.
Return type:
-
predict
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test') → numpy.ndarray[source] Predict the node class number.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective
when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
.
Returns: result – An array of shape
(N,)
, whereN
is the number of test nodes. The prediction on given dataset.Return type: np.ndarray
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
-
predict_proba
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test') → numpy.ndarray[source] Predict the node probability.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
.
Returns: result – An array of shape
(N,C,)
, whereN
is the number of test nodes andC
is the number of classes. The prediction on given dataset.Return type: np.ndarray
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
- feature_module (autogl.module.feature.BaseFeatureEngineer or str or None) – The (name of) auto feature engineer used to process the given dataset. Default
-
class
autogl.solver.
AutoGraphClassifier
(feature_module=None, graph_models=('gin', 'topkpool'), hpo_module='anneal', ensemble_module='voting', max_evals=50, default_trainer='GraphClassificationFull', trainer_hp_space=None, model_hp_spaces=None, size=4, device='auto')[source] Auto Multi-class Graph Classifier.
Used to automatically solve the graph classification problems.
Parameters: - feature_module (autogl.module.feature.BaseFeatureEngineer or str or None) – The (name of) auto feature engineer used to process the given dataset.
Disable feature engineer by setting it to
None
. Defaultdeepgl
. - graph_models (list of autogl.module.model.BaseModel or list of str) – The (name of) models to be optimized as backbone. Default
['gat', 'gcn']
. - hpo_module (autogl.module.hpo.BaseHPOptimizer or str or None) – The (name of) hpo module used to search for best hyper parameters.
Disable hpo by setting it to
None
. Defaultanneal
. - ensemble_module (autogl.module.ensemble.BaseEnsembler or str or None) – The (name of) ensemble module used to ensemble the multi-models found.
Disable ensemble by setting it to
None
. Defaultvoting
. - max_evals (int (Optional)) – If given, will set the number eval times the hpo module will use.
Only be effective when hpo_module is
str
. DefaultNone
. - default_trainer (str (Optional)) – The (name of) the trainer used in this solver. Default to
NodeClassificationFull
. - trainer_hp_space (Iterable[dict] (Optional)) – trainer hp space or list of trainer hp spaces configuration.
If a single trainer hp is given, will specify the hp space of trainer for
every model. If a list of trainer hp is given, will specify every model
with corrsponding trainer hp space. Default
None
. - model_hp_spaces (list of list of dict (Optional)) – model hp space configuration.
If given, will specify every hp space of every passed model. Default
None
. If the encoder(-decoder) is passed, the space should be a dict containing keys “encoder” and “decoder”, specifying the detailed encoder decoder hp spaces. - size (int (Optional)) – The max models ensemble module will use. Default
None
. - device (torch.device or str) – The device where model will be running on. If set to
auto
, will use gpu when available. You can also specify the device by directly givinggpu
orcuda:0
, etc. Defaultauto
.
-
evaluate
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test', label=None, metric='acc')[source] Evaluate the given dataset.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
. - label (torch.Tensor (Optional)) – The groud truth label of the given predicted dataset split. If not passed, will extract labels from the input dataset.
- metric (str) – The metric to be used for evaluating the model. Default
acc
.
Returns: score(s) – the evaluation results according to the evaluator passed.
Return type: (list of) evaluation scores
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
-
fit
(dataset, time_limit=-1, inplace=False, train_split=None, val_split=None, evaluation_method='infer', seed=None) → autogl.solver.classifier.graph_classifier.AutoGraphClassifier[source] Fit current solver on given dataset.
Parameters: - dataset (autogl.data.dataset) – The multi-graph dataset needed to fit on.
- time_limit (int) – The time limit of the whole fit process (in seconds). If set below 0, will ignore
time limit. Default
-1
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - train_split (float or int (Optional)) – The train ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - val_split (float or int (Optional)) – The validation ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - evaluation_method ((list of) str autogl.module.train.evaluation) – A (list of) evaluation method for current solver. If
infer
, will automatically determine. Defaultinfer
. - seed (int (Optional)) – The random seed. If set to
None
, will run everything at random. DefaultNone
.
Returns: self – A reference of current solver.
Return type:
-
fit_predict
(dataset, time_limit=-1, inplace=False, train_split=None, val_split=None, evaluation_method='infer', seed=None, use_ensemble=True, use_best=True, name=None) → numpy.ndarray[source] Fit current solver on given dataset and return the predicted value.
Parameters: - dataset (torch_geometric.data.dataset.Dataset) – The dataset needed to fit on. This dataset must have only one graph.
- time_limit (int) – The time limit of the whole fit process (in seconds). If set below 0, will
ignore time limit. Default
-1
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - train_split (float or int (Optional)) – The train ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - val_split (float or int (Optional)) – The validation ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - evaluation_method ((list of) str or autogl.module.train.evaluation) – A (list of) evaluation method for current solver. If
infer
, will automatically determine. Defaultinfer
. - seed (int (Optional)) – The random seed. If set to
None
, will run everything at random. DefaultNone
. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
.
Returns: result – An array of shape
(N,)
, whereN
is the number of test nodes. The prediction on given dataset.Return type: np.ndarray
-
classmethod
from_config
(path_or_dict, filetype='auto') → autogl.solver.classifier.graph_classifier.AutoGraphClassifier[source] Load solver from config file.
You can use this function to directly load a solver from predefined config dict or config file path. Currently, only support file type of
json
oryaml
, if you pass a path.Parameters: - path_or_dict (str or dict) – The path to the config file or the config dictionary object
- filetype (str) – The filetype the given file if the path is specified. Currently only support
json
oryaml
. You can set toauto
to automatically detect the file type (from file name). Defaultauto
.
Returns: solver – The solver that is created from given file or dictionary.
Return type:
-
predict
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test') → numpy.ndarray[source] Predict the node class number.
Parameters: - dataset (autogl.data.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective
when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
.
Returns: result – An array of shape
(N,)
, whereN
is the number of test nodes. The prediction on given dataset.Return type: np.ndarray
- dataset (autogl.data.Dataset or None) – The dataset needed to predict. If
-
predict_proba
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test') → numpy.ndarray[source] Predict the node probability.
Parameters: - dataset (autogl.data.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
.
Returns: result – An array of shape
(N,C,)
, whereN
is the number of test nodes andC
is the number of classes. The prediction on given dataset.Return type: np.ndarray
- dataset (autogl.data.Dataset or None) – The dataset needed to predict. If
- feature_module (autogl.module.feature.BaseFeatureEngineer or str or None) – The (name of) auto feature engineer used to process the given dataset.
Disable feature engineer by setting it to
-
class
autogl.solver.
AutoLinkPredictor
(feature_module=None, graph_models=('gat', 'gcn'), hpo_module='anneal', ensemble_module='voting', max_evals=50, default_trainer='LinkPredictionFull', trainer_hp_space=None, model_hp_spaces=None, size=4, device='auto')[source] Auto Link Predictor.
Used to automatically solve the link prediction problems.
Parameters: - feature_module (autogl.module.feature.BaseFeatureEngineer or str or None) – The (name of) auto feature engineer used to process the given dataset. Default
deepgl
. Disable feature engineer by setting it toNone
. - graph_models (list of autogl.module.model.BaseModel or list of str) – The (name of) models to be optimized as backbone. Default
['gat', 'gcn']
. - hpo_module (autogl.module.hpo.BaseHPOptimizer or str or None) – The (name of) hpo module used to search for best hyper parameters. Default
anneal
. Disable hpo by setting it toNone
. - ensemble_module (autogl.module.ensemble.BaseEnsembler or str or None) – The (name of) ensemble module used to ensemble the multi-models found. Default
voting
. Disable ensemble by setting it toNone
. - max_evals (int (Optional)) – If given, will set the number eval times the hpo module will use.
Only be effective when hpo_module is
str
. DefaultNone
. - default_trainer (str (Optional)) – The (name of) the trainer used in this solver. Default to
NodeClassificationFull
. - trainer_hp_space (list of dict (Optional)) – trainer hp space or list of trainer hp spaces configuration.
If a single trainer hp is given, will specify the hp space of trainer for every model.
If a list of trainer hp is given, will specify every model with corrsponding
trainer hp space.
Default
None
. - model_hp_spaces (list of list of dict (Optional)) – model hp space configuration.
If given, will specify every hp space of every passed model. Default
None
. If the encoder(-decoder) is passed, the space should be a dict containing keys “encoder” and “decoder”, specifying the detailed encoder decoder hp spaces. - size (int (Optional)) – The max models ensemble module will use. Default
None
. - device (torch.device or str) – The device where model will be running on. If set to
auto
, will use gpu when available. You can also specify the device by directly givinggpu
orcuda:0
, etc. Defaultauto
.
-
evaluate
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test', label=None, metric='auc')[source] Evaluate the given dataset.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
. - label (torch.Tensor (Optional)) – The groud truth label of the given predicted dataset split. If not passed, will extract labels from the input dataset.
- metric (str) – The metric to be used for evaluating the model. Default
auc
.
Returns: score(s) – the evaluation results according to the evaluator passed.
Return type: (list of) evaluation scores
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
-
fit
(dataset, time_limit=-1, inplace=False, train_split=None, val_split=None, evaluation_method='infer', seed=None) → autogl.solver.classifier.link_predictor.AutoLinkPredictor[source] Fit current solver on given dataset.
Parameters: - dataset (torch_geometric.data.dataset.Dataset) – The dataset needed to fit on. This dataset must have only one graph.
- time_limit (int) – The time limit of the whole fit process (in seconds). If set below 0,
will ignore time limit. Default
-1
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - train_split (float or int (Optional)) – The train ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - val_split (float or int (Optional)) – The validation ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - evaluation_method ((list of) str or autogl.module.train.evaluation) – A (list of) evaluation method for current solver. If
infer
, will automatically determine. Defaultinfer
. - seed (int (Optional)) – The random seed. If set to
None
, will run everything at random. DefaultNone
.
Returns: self – A reference of current solver.
Return type:
-
fit_predict
(dataset, time_limit=-1, inplace=False, train_split=None, val_split=None, evaluation_method='infer', use_ensemble=True, use_best=True, name=None) → numpy.ndarray[source] Fit current solver on given dataset and return the predicted value.
Parameters: - dataset (torch_geometric.data.dataset.Dataset) – The dataset needed to fit on. This dataset must have only one graph.
- time_limit (int) – The time limit of the whole fit process (in seconds).
If set below 0, will ignore time limit. Default
-1
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - train_split (float or int (Optional)) – The train ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - val_split (float or int (Optional)) – The validation ratio (in
float
) or number (inint
) of dataset. If you want to use default train/val/test split in dataset, please set this toNone
. DefaultNone
. - balanced (bool) – Wether to create the train/valid/test split in a balanced way.
If set to
True
, the train/valid will have the same number of different classes. DefaultFalse
. - evaluation_method ((list of) str or autogl.module.train.evaluation) – A (list of) evaluation method for current solver. If
infer
, will automatically determine. Defaultinfer
. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
.
Returns: result – An array of shape
(N,)
, whereN
is the number of test nodes. The prediction on given dataset.Return type: np.ndarray
-
classmethod
from_config
(path_or_dict, filetype='auto') → autogl.solver.classifier.link_predictor.AutoLinkPredictor[source] Load solver from config file.
You can use this function to directly load a solver from predefined config dict or config file path. Currently, only support file type of
json
oryaml
, if you pass a path.Parameters: - path_or_dict (str or dict) – The path to the config file or the config dictionary object
- filetype (str) – The filetype the given file if the path is specified. Currently only support
json
oryaml
. You can set toauto
to automatically detect the file type (from file name). Defaultauto
.
Returns: solver – The solver that is created from given file or dictionary.
Return type:
-
predict
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test', threshold=0.5) → numpy.ndarray[source] Predict the node class number.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective
when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
. - threshold (float) – The threshold to judge whether the edges are positive or not.
Returns: result – An array of shape
(N,)
, whereN
is the number of test nodes. The prediction on given dataset.Return type: np.ndarray
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
-
predict_proba
(dataset=None, inplaced=False, inplace=False, use_ensemble=True, use_best=True, name=None, mask='test') → numpy.ndarray[source] Predict the node probability.
Parameters: - dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
None
, will use the processed dataset passed tofit()
instead. DefaultNone
. - inplaced (bool) – Whether the given dataset is processed. Only be effective when
dataset
is notNone
. If you pass the dataset tofit()
withinplace=True
, and you pass the dataset again to this method, you should set this argument toTrue
. OtherwiseFalse
. DefaultFalse
. - inplace (bool) – Whether we process the given dataset in inplace manner. Default
False
. Set it to True if you want to save memory by modifying the given dataset directly. - use_ensemble (bool) – Whether to use ensemble to do the predict. Default
True
. - use_best (bool) – Whether to use the best single model to do the predict. Will only be effective when
use_ensemble
isFalse
. DefaultTrue
. - name (str or None) – The name of model used to predict. Will only be effective when
use_ensemble
anduse_best
both areFalse
. DefaultNone
. - mask (str) – The data split to give prediction on. Default
test
.
Returns: result – An array of shape
(N,C,)
, whereN
is the number of test nodes andC
is the number of classes. The prediction on given dataset.Return type: np.ndarray
- dataset (torch_geometric.data.dataset.Dataset or None) – The dataset needed to predict. If
- feature_module (autogl.module.feature.BaseFeatureEngineer or str or None) – The (name of) auto feature engineer used to process the given dataset. Default
-
class
autogl.solver.
LeaderBoard
(fields, is_higher_better)[source] The leaderBoard that can be used to store / sort the model performance automatically.
Parameters: - fields (list of str) – A list of field name that shows the model performance. The first field is used as the major field for sorting the model performances.
- is_higher_better (dict of field -> bool) – A mapping of indicator that whether each field is higher better.
-
get_best_model
(index=0) → str[source] Get the best model according to the performance of the major field.
Parameters: index (int) – The index of the model (from good to bad). Default 0. Returns: name – The name/identifier of the required model. Return type: str
-
insert_model_performance
(name, performance) → None[source] Add/Override a record of model performance. If name given is already in the leaderboard, will overrride the slot.
Parameters: - name (str) – The model name/identifier that identifies the model.
- performance (dict) – The performance dict. The key inside the dict should be the fields when initialized. The value of the dict should be the corresponding scores.
Returns: Return type: None
-
remove_model_performance
(name) → None[source] Remove the record of given models.
Parameters: name (str) – The model name/identifier that needed to be removed. Returns: Return type: None