autogl.module.feature

We support feature engineering for both PyTorch Geometric and Deep Deep Graph Library backend.

class autogl.module.feature.EigenFeatureGenerator(size: int = 32)[source]

concat Eigen features

Notes

An implementation of [1]

References

[1]Ziwei Zhang, Peng Cui, Jian Pei, Xin Wang, Wenwu Zhu: Eigen-GNN: A Graph Structure Preserving Plug-in for GNNs. TKDE (2021) https://arxiv.org/abs/2006.04330
Parameters:size (int) – EigenGNN hidden size
class autogl.module.feature.GraphletGenerator(override_features: bool = False)[source]

generate local graphlet numbers as features. The implementation refers to [2] .

References

[2]Ahmed, N. K., Willke, T. L., & Rossi, R. A. (2016). Estimation of local subgraph counts. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, 586–595. https://doi.org/10.1109/BigData.2016.7840651
class autogl.module.feature.LocalDegreeProfileGenerator(override_features: bool = False)[source]

Appends the Local Degree Profile (LDP) from the “A Simple yet Effective Baseline for Non-attribute Graph Classification” paper

\[\mathbf{x}_i = \mathbf{x}_i \, \Vert \, (\deg(i), \min(DN(i)), \max(DN(i)), \textrm{mean}(DN(i)), \textrm{std}(DN(i)))\]

to the node features, where \(DN(i) = \{ \deg(j) \mid j \in \mathcal{N}(i) \}\).

class autogl.module.feature.OneHotDegreeGenerator(max_degree: int = 1000, in_degree: bool = False, cat: bool = True)[source]

Adds the node degree as one hot encodings to the node features.

Parameters:
  • max_degree (int) – Maximum degree.
  • in_degree (bool, optional) – If set to True, will compute the in-degree of nodes instead of the out-degree. (default: False)
  • cat (bool, optional) – Concat node degrees to node features instead of replacing them. (default: True)
class autogl.module.feature.NetLSD(*args, **kwargs)[source]

Notes

a graph feature generation method. This is a simple wrapper of NetLSD [3].

References

[3]A. Tsitsulin, D. Mottin, P. Karras, A. Bronstein, and E. Müller, “NetLSD: Hearing the shape of a graph,” Proc. ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., pp. 2347–2356, 2018.
class autogl.module.feature.FilterConstant[source]

drop constant features

class autogl.module.feature.GBDTFeatureSelector(fixlen: int = 10, *args, **kwargs)[source]

simple wrapper of lightgbm , using importance ranking to select top-k features.

Parameters:fixlen (int) – K for top-K important features.
class autogl.module.feature.AutoFeatureEngineer(fix_length: int = 200, max_epoch: int = 5, time_budget: Optional[int] = None, feature_selector=<class 'autogl.module.feature._selectors._gbdt.GBDTFeatureSelector'>, verbosity: int = 0, *args, **kwargs)[source]

Notes

An implementation of auto feature engineering method Deepgl [4] ,which iteratively generates features by aggregating neighbour features and select a fixed number of features to automatically add important graph-aware features.

References

[4]Rossi, R. A., Zhou, R., & Ahmed, N. K. (2020). Deep Inductive Graph Representation Learning. IEEE Transactions on Knowledge and Data Engineering, 32(3), 438–452. https://doi.org/10.1109/TKDE.2018.2878247
Parameters:
  • fix_length (int) – fixed number of features for every epoch. The final number of features added will be fixlen times max_epoch, 200 times 5 in default.
  • max_epoch (int) – number of epochs in total process.
  • time_budget (int) – timebudget(seconds) for the feature engineering process, None for no time budget . Note that this time budget is a soft budget ,which is obtained by rough time estimation through previous iterations and may finally exceed the actual timebudget
  • y_sel_func (Callable) – feature selector function object for selection at each iteration ,lightgbm in default. Note that in original paper, connected components of feature graph is used , and you may implement it by yourself if you want.
  • verbosity (int) – hide any infomation except error and fatal if verbosity < 1