Neural Architecture Search
We support different neural architecture search algorithm in variant search space. Neural architecture search is usually constructed by three modules: search space, search strategy and estimation strategy.
The search space describes all possible architectures to be searched. There are mainly two parts of the space formulated, the operations(e.g. GCNconv, GATconv) and the input-ouput relations. A large space may have better optimal architecture but demands more effect to explore. Human knowledge can help to design a reasonable search space to reduce the efforts of search strategy.
The search strategy controls how to explore the search space. It encompasses the classical exploration-exploitation trade-off since. On the one hand, it is desirable to find well-performing architectures quickly, while on the other hand, premature convergence to a region of suboptimal architectures should be avoided.
The estimation strategy gives the performance of certain architectures when it is explored. The simplest option is to perform a standard training and validation of the architecture on data. Since there are lots of architectures need estimating in the whole searching process, estimation strategy is desired to be very efficient to save computational resources.
To be more flexible, we modulize NAS process with three part: algorithm, space and estimator, corresponding to the three module search space, search strategy and estimation strategy. Different models in different parts can be composed in some certain constrains. If you want to design your own NAS process, you can change any of those parts according to your demand.
Usage
You can directly enable architecture search for node classification tasks by passing the algorithms, spaces and estimators to solver. Following shows an example:
# Use graphnas to solve cora
from autogl.datasets import build_dataset_from_name
from autogl.solver import AutoNodeClassifier
solver = AutoNodeClassifier(
feature = 'PYGNormalizeFeatures',
graph_models = (),
hpo = 'tpe',
ensemble = None,
nas_algorithms=['rl'],
nas_spaces='graphnasmacro',
nas_estimators=['scratch']
)
cora = build_dataset_from_name('cora')
solver.fit(cora)
The code above will first find the best architecture in space graphnasmacro
using rl
search algorithm.
Then the searched architecture will be further optimized through hyperparameter-optimization tpe
.
Note
The graph_models
argument is not conflict with nas module. You can set graph_models
to
other hand-crafted models beside the ones found by nas. Once the architectures are derived from nas module,
they act in the same way as hand-crafted models directly passed through graph_models.
Search Space
The space definition is base on mutable fashion used in NNI, which is defined as a model inheriting BaseSpace There are mainly two ways to define your search space, one can be performed with one-shot fashion while the other cannot. Currently, we support following search space:
Space | Description |
---|---|
singlepath [4] |
Architectures with several sequential layers with each layer choosing only one path |
graphnas [1] |
The graph nas micro search space designed for fully supervised node classification models |
graphnasmacro [1] |
The graph nas macro search space designed for semi-superwised node classification models |
You can also define your own nas search space.
If you need one-shot fashion, you should use the function setLayerChoice
and setInputChoice
to construct the super network.
Here is an example.
# For example, create an NAS search space by yourself
from autogl.module.nas.space.base import BaseSpace
from autogl.module.nas.space.operation import gnn_map
class YourOneShotSpace(BaseSpace):
# Get essential parameters at initialization
def __init__(self, input_dim = None, output_dim = None):
super().__init__()
# must contain input_dim and output_dim in space, or you can initialize these two parameters in function `instantiate`
self.input_dim = input_dim
self.output_dim = output_dim
# Instantiate the super network
def instantiate(self, input_dim = None, output_dim = None):
# must call super in this function
super().instantiate()
self.input_dim = input_dim or self.input_dim
self.output_dim = output_dim or self.output_dim
# define two layers with order 0 and 1
setattr(self, 'layer0', self.setLayerChoice(0, [gnn_map(op,self.input_dim,self.output_dim)for op in ['gcn', 'gat']], key = 'layer0')
setattr(self, 'layer1', self.setLayerChoice(1, [gnn_map(op,self.input_dim,self.output_dim)for op in ['gcn', 'gat']], key = 'layer1')
# define an input choice to choose from the result of the two layer
setattr(self, 'input_layer', self.setInputChoice(2, choose_from = ['layer0', 'layer1'], n_chosen = 1, returen_mask = False, key = 'input_layer'))
self._initialized = True
# Define the forward process
def forward(self, data):
x, edges = data.x, data.edge_index
x_0 = self.layer0(x, edges)
x_1 = self.layer1(x, edges)
y = self.input_layer([x_0, x_1])
y = F.log_fostmax(y, dim = 1)
return y
# For one-shot fashion, you can directly use following scheme in ``parse_model``
def parse_model(self, selection, device) -> BaseModel:
return self.wrap().fix(selection)
Also, you can use the way which does not support one shot fashion. In this way, you can directly copy you model with few changes. But you can only use sample-based search strategy.
# For example, create an NAS search space by yourself
from autogl.module.nas.space.base import BaseSpace, map_nn
from autogl.module.nas.space.operation import gnn_map
# here we search from three types of graph convolution with `head` as a parameter
# we should search `heads` at the same time with the convolution
from torch_geometric.nn import GATConv, FeaStConv, TransformerConv
class YourNonOneShotSpace(BaseSpace):
# Get essential parameters at initialization
def __init__(self, input_dim = None, output_dim = None):
super().__init__()
# must contain input_dim and output_dim in space, or you can initialize these two parameters in function `instantiate`
self.input_dim = input_dim
self.output_dim = output_dim
# Instantiate the super network
def instantiate(self, input_dim, output_dim):
# must call super in this function
super().instantiate()
self.input_dim = input_dim or self.input_dim
self.output_dim = output_dim or self.output_dim
# set your choices as LayerChoices
self.choice0 = self.setLayerChoice(0, map_nn(["gat", "feast", "transformer"]), key="conv")
self.choice1 = self.setLayerChoice(1, map_nn([1, 2, 4, 8]), key="head")
# You do not need to define forward process here
# For non-one-shot fashion, you can directly return your model based on the choices
# ``YourModel`` must inherit BaseSpace.
def parse_model(self, selection, device) -> BaseModel:
model = YourModel(selection, self.input_dim, self.output_dim).wrap()
return model
# YourModel can be defined as follows
class YourModel(BaseSpace):
def __init__(self, selection, input_dim, output_dim):
self.input_dim = input_dim
self.output_dim = output_dim
if selection["conv"] == "gat":
conv = GATConv
elif selection["conv"] == "feast":
conv = FeaStConv
elif selection["conv"] == "transformer":
conv = TransformerConv
self.layer = conv(input_dim, output_dim, selection["head"])
def forward(self, data):
x, edges = data.x, data.edge_index
y = self.layer(x, edges)
return y
Performance Estimator
The performance estimator estimates the performance of an architecture. Currently we support following estimators:
Estimator | Description |
---|---|
oneshot |
Directly evaluating the given models without training |
scratch |
Train the models from scratch and then evaluate them |
You can also write your own estimator. Here is an example of estimating an architecture without training (used in one-shot space).
# For example, create an NAS estimator by yourself
from autogl.module.nas.estimator.base import BaseEstimator
class YourOneShotEstimator(BaseEstimator):
# The only thing you should do is defining ``infer`` function
def infer(self, model: BaseSpace, dataset, mask="train"):
device = next(model.parameters()).device
dset = dataset[0].to(device)
# Forward the architecture
pred = model(dset)[getattr(dset, f"{mask}_mask")]
y = dset.y[getattr(dset, f'{mask}_mask')]
# Use default loss function and metrics to evaluate the architecture
loss = getattr(F, self.loss_f)(pred, y)
probs = F.softmax(pred, dim = 1)
metrics = [eva.evaluate(probs, y) for eva in self.evaluation]
return metrics, loss
Search Strategy
The space strategy defines how to find an architecture. We currently support following search strategies:
Strategy | Description |
---|---|
random |
Random search by uniform sampling |
rl [1] |
Use rl as architecture generator agent |
enas [2] |
efficient neural architecture search |
darts [3] |
differentiable neural architecture search |
Sample-based strategy without weight sharing is simpler than strategies with weight sharing. We show how to define your strategy here with DFS as an example. If you want to define more complex strategy, you can refer to Darts, Enas or other strategies in NNI.
from autogl.module.nas.algorithm.base import BaseNAS
class RandomSearch(BaseNAS):
# Get the number of samples at initialization
def __init__(self, n_sample):
super().__init__()
self.n_sample = n_sample
# The key process in NAS algorithm, search for an architecture given space, dataset and estimator
def search(self, space: BaseSpace, dset, estimator):
self.estimator=estimator
self.dataset=dset
self.space=space
self.nas_modules = []
k2o = get_module_order(self.space)
# collect all mutables in the space
replace_layer_choice(self.space, PathSamplingLayerChoice, self.nas_modules)
replace_input_choice(self.space, PathSamplingInputChoice, self.nas_modules)
# sort all mutables with given orders
self.nas_modules = sort_replaced_module(k2o, self.nas_modules)
# get a dict cantaining all chioces
selection_range={}
for k,v in self.nas_modules:
selection_range[k]=len(v)
self.selection_dict=selection_range
arch_perfs=[]
# define DFS process
self.selection = {}
last_k = list(self.selection_dict.keys())[-1]
def dfs():
for k,v in self.selection_dict.items():
if not k in self.selection:
for i in range(v):
self.selection[k] = i
if k == last_k:
# evaluate an architecture
self.arch=space.parse_model(self.selection,self.device)
metric,loss=self._infer(mask='val')
arch_perfs.append([metric, self.selection.copy()])
else:
dfs()
del self.selection[k]
break
dfs()
# get the architecture with the best performance
selection=arch_perfs[np.argmax([x[0] for x in arch_perfs])][1]
arch=space.parse_model(selection,self.device)
return arch
Different search strategies should be combined with different search spaces and estimators in usage.
Space | single path | GraphNAS[1] | GraphNAS-macro[1] |
---|---|---|---|
Random | ✓ | ✓ | ✓ |
RL | ✓ | ✓ | ✓ |
GraphNAS [1] | ✓ | ✓ | ✓ |
ENAS [2] | ✓ | ||
DARTS [3] | ✓ |
Estimator | one-shot | Train |
---|---|---|
Random | ✓ | |
RL | ✓ | |
GraphNAS [1] | ✓ | |
ENAS [2] | ✓ | |
DARTS [3] | ✓ |
[1] | (1, 2, 3, 4, 5) Gao, Yang, et al. “Graph neural architecture search.” IJCAI. Vol. 20. 2020. |
[2] | (1, 2, 3) Pham, Hieu, et al. “Efficient neural architecture search via parameters sharing.” International Conference on Machine Learning. PMLR, 2018. |
[3] | (1, 2, 3) Liu, Hanxiao, Karen Simonyan, and Yiming Yang. “DARTS: Differentiable Architecture Search.” International Conference on Learning Representations. 2018. |
[4] | Guo, Zichao, et al. “Single Path One-Shot Neural Architecture Search with Uniform Sampling.” European Conference on Computer Vision, 2019, pp. 544–560. |