Neural Architecture Search

We support different neural architecture search algorithms in variant search space. Neural architecture search is usually constructed by three modules: search space, search strategy, and estimation strategy.

The search space describes all possible architectures to be searched. There are mainly two parts of the space formulated, the operations(e.g. GCNconv, GATconv) and the input-output relations. A large space may have better optimal architecture but demands more effect to explore. Human knowledge can help to design a reasonable search space to reduce the efforts of the search strategy.

The search strategy controls how to explore the search space. It encompasses the classical exploration-exploitation trade-off since. On the one hand, it is desirable to find well-performing architectures quickly, while on the other hand, premature convergence to a region of suboptimal architectures should be avoided.

The estimation strategy gives the performance of certain architectures when it is explored. The simplest option is to perform a standard training and validation of the architecture on data. Since there are lots of architectures that need estimating in the whole search process, the estimation strategy is desired to be very efficient to save computational resources.

To be more flexible, we modulize the NAS process with three parts: algorithm, space, and estimator, corresponding to the three module search space, search strategy, and estimation strategy. Different models in different parts can be composed under certain constraints. If you want to design your own NAS process, you can change any of those parts according to your demand.

Usage

You can directly import specific space, algorithms, and estimators to search gnns for specific datasets. The following shows an example:

from autogllight.nas.space import GraphNasNodeClassificationSpace
from autogllight.nas.algorithm import GraphNasRL
from autogllight.nas.estimator import OneShotEstimator
from torch_geometric.datasets import Planetoid
from os import path as osp
import torch_geometric.transforms as T

# Use graphnas to search gnns for cora
dataname = "cora"
dataset = Planetoid(
    osp.expanduser("~/.cache-autogl"), dataname, transform=T.NormalizeFeatures()
)
data = dataset[0]
label = data.y
input_dim = data.x.shape[-1]
num_classes = len(np.unique(label.numpy()))

space = GraphNasNodeClassificationSpace(input_dim=input_dim, output_dim=num_classes)
space.instantiate()
algo = GraphNasRL(num_epochs=2, ctrl_steps_aggregate=2, weight_share=False)
estimator = OneShotEstimator()
algo.search(space, dataset, estimator)

The code above will first find the best architecture in space GraphNasNodeClassificationSpace using GraphNasRL search algorithm.

Search Space

The space definition is based on mutable fashion used in NNI, which is defined as a model inheriting BaseSpace There are mainly two ways to define your search space, one can be performed in one-shot fashion while the other cannot. Currently, we support the following search space: SinglePathNodeClassificationSpace, GassoSpace, GraphNasNodeClassificationSpace, GraphNasMacroNodeClassificationSpace,AutoAttendNodeClassificationSpace.

You can also define your own nas search space. You should overwrite the function build_graph to construct the super network. Here is an example.

from autogllight.space.base import BaseSpace

# For example, create a NAS search space by yourself
class SinglePathNodeClassificationSpace(BaseSpace):
def __init__(
    self,
    hidden_dim: _typ.Optional[int] = 64,
    layer_number: _typ.Optional[int] = 2,
    dropout: _typ.Optional[float] = 0.2,
    input_dim: _typ.Optional[int] = None,
    output_dim: _typ.Optional[int] = None,
    ops: _typ.Tuple = ["gcn", "gat_8"],
):
    super().__init__()
    self.layer_number = layer_number
    self.hidden_dim = hidden_dim
    self.input_dim = input_dim
    self.output_dim = output_dim
    self.ops = ops
    self.dropout = dropout

def build_graph(self):
    for layer in range(self.layer_number):
        key = f"op_{layer}"
        in_dim = self.input_dim if layer == 0 else self.hidden_dim
        out_dim = (self.output_dim if layer == self.layer_number - 1 else self.hidden_dim)
        op_candidates = [
            op(in_dim, out_dim)
            if isinstance(op, type)
            else gnn_map(op, in_dim, out_dim)
            for op in self.ops
        ]
        self.setLayerChoice(layer, op_candidates, key=key)

def forward(self, data):
    x = BK.feat(data)
    for layer in range(self.layer_number):
        op = getattr(self, f"op_{layer}")
        x = BK.gconv(op, data, x)
        if layer != self.layer_number - 1:
            x = F.leaky_relu(x)
            x = F.dropout(x, p=self.dropout, training=self.training)
    return F.log_softmax(x, dim=1)

Performance Estimator

The performance estimator estimates the performance of an architecture. Currently, we support the following estimators:

Estimator	Description
`oneshot`	Directly evaluating the given models without training
`scratch`	Train the models from scratch and then evaluate them

You can also write your own estimator. Here is an example of estimating an architecture without training (used in one-shot space).

# For example, create a NAS estimator by yourself
from autogllight.nas.estimator.base import BaseEstimator
class YourOneShotEstimator(BaseEstimator):
    # The only thing you should do is defining ``infer`` function
    def infer(self, model: BaseSpace, dataset, mask="train"):
        device = next(model.parameters()).device
        dset = dataset[0].to(device)
        # Forward the architecture
        pred = model(dset)[getattr(dset, f"{mask}_mask")]
        y = dset.y[getattr(dset, f'{mask}_mask')]
        # Use default loss function and metrics to evaluate the architecture
        loss = getattr(F, self.loss_f)(pred, y)
        probs = F.softmax(pred, dim = 1)
        metrics = [eva.evaluate(probs, y) for eva in self.evaluation]
        return metrics, loss

Search Strategy

The space strategy defines how to find an architecture. We currently support the following search strategies:RandomSearch, Darts, RL, GraphNasRL, Enas, Spos, GRNA, and Gasso.

Sample-based strategy without weight sharing is simpler than strategies with weight sharing. We show how to define your strategy here with DFS as an example.

from autogllight.nas.algorithm.base import BaseNAS
class RandomSearch(BaseNAS):
    # Get the number of samples at initialization
    def __init__(self, n_sample):
        super().__init__()
        self.n_sample = n_sample

    # The key process in the NAS algorithm is, search for an architecture given space, dataset and estimator
    def search(self, space: BaseSpace, dset, estimator):
        self.estimator=estimator
        self.dataset=dset
        self.space=space

        self.nas_modules = []
        k2o = get_module_order(self.space)
        # collect all mutables in the space
        replace_layer_choice(self.space, PathSamplingLayerChoice, self.nas_modules)
        replace_input_choice(self.space, PathSamplingInputChoice, self.nas_modules)
        # sort all mutables with given orders
        self.nas_modules = sort_replaced_module(k2o, self.nas_modules)
        # get a dict containing all choices
        selection_range={}
        for k,v in self.nas_modules:
            selection_range[k]=len(v)
        self.selection_dict=selection_range

        arch_perfs=[]
        # define DFS process
        self.selection = {}
        last_k = list(self.selection_dict.keys())[-1]
        def dfs():
            for k,v in self.selection_dict.items():
                if not k in self.selection:
                    for i in range(v):
                        self.selection[k] = i
                        if k == last_k:
                            # evaluate an architecture
                            self.arch=space.parse_model(self.selection,self.device)
                            metric,loss=self._infer(mask='val')
                            arch_perfs.append([metric, self.selection.copy()])
                        else:
                            dfs()
                    del self.selection[k]
                    break
        dfs()

        # get the architecture with the best performance
        selection=arch_perfs[np.argmax([x[0] for x in arch_perfs])][1]
        arch=space.parse_model(selection,self.device)
        return arch

Different search strategies should be combined with different search spaces and estimators in usage. Most search spaces, search strategies, and estimators are compatible.