.. _nas:

Neural Architecture Search
============================

We support different neural architecture search algorithms in variant search space.
Neural architecture search is usually constructed by three modules: search space, search strategy, and estimation strategy.

The search space describes all possible architectures to be searched. There are mainly two parts of the space formulated, the operations(e.g. GCNconv, GATconv) and the input-output relations.
A large space may have better optimal architecture but demands more effect to explore.
Human knowledge can help to design a reasonable search space to reduce the efforts of the search strategy.

The search strategy controls how to explore the search space. 
It encompasses the classical exploration-exploitation trade-off since.
On the one hand, it is desirable to find well-performing architectures quickly, 
while on the other hand, premature convergence to a region of suboptimal architectures should be avoided.

The estimation strategy gives the performance of certain architectures when it is explored.
The simplest option is to perform a standard training and validation of the architecture on data.
Since there are lots of architectures that need estimating in the whole search process, the estimation strategy is desired to be very efficient to save computational resources.

To be more flexible, we modulize the NAS process with three parts: algorithm, space, and estimator, corresponding to the three module search space, search strategy, and estimation strategy.
Different models in different parts can be composed under certain constraints.
If you want to design your own NAS process, you can change any of those parts according to your demand.

Usage
-----

You can directly import specific space, algorithms, and estimators to search gnns for specific datasets. The following shows an example:

.. code-block:: python

    from autogllight.nas.space import GraphNasNodeClassificationSpace
    from autogllight.nas.algorithm import GraphNasRL
    from autogllight.nas.estimator import OneShotEstimator
    from torch_geometric.datasets import Planetoid
    from os import path as osp
    import torch_geometric.transforms as T
    
    # Use graphnas to search gnns for cora
    dataname = "cora"
    dataset = Planetoid(
        osp.expanduser("~/.cache-autogl"), dataname, transform=T.NormalizeFeatures()
    )
    data = dataset[0]
    label = data.y
    input_dim = data.x.shape[-1]
    num_classes = len(np.unique(label.numpy()))

    space = GraphNasNodeClassificationSpace(input_dim=input_dim, output_dim=num_classes)
    space.instantiate()
    algo = GraphNasRL(num_epochs=2, ctrl_steps_aggregate=2, weight_share=False)
    estimator = OneShotEstimator()
    algo.search(space, dataset, estimator)

The code above will first find the best architecture in space ``GraphNasNodeClassificationSpace`` using ``GraphNasRL`` search algorithm.

Search Space
------------

The space definition is based on mutable fashion used in NNI, which is defined as a model inheriting BaseSpace
There are mainly two ways to define your search space, one can be performed in one-shot fashion while the other cannot.
Currently, we support the following search space: SinglePathNodeClassificationSpace, GassoSpace, GraphNasNodeClassificationSpace, GraphNasMacroNodeClassificationSpace,AutoAttendNodeClassificationSpace.

You can also define your own nas search space. You should overwrite the function ``build_graph`` to construct the super network.
Here is an example.

.. code-block:: python

    from autogllight.space.base import BaseSpace

    # For example, create a NAS search space by yourself
    class SinglePathNodeClassificationSpace(BaseSpace):
    def __init__(
        self,
        hidden_dim: _typ.Optional[int] = 64,
        layer_number: _typ.Optional[int] = 2,
        dropout: _typ.Optional[float] = 0.2,
        input_dim: _typ.Optional[int] = None,
        output_dim: _typ.Optional[int] = None,
        ops: _typ.Tuple = ["gcn", "gat_8"],
    ):
        super().__init__()
        self.layer_number = layer_number
        self.hidden_dim = hidden_dim
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.ops = ops
        self.dropout = dropout

    def build_graph(self):
        for layer in range(self.layer_number):
            key = f"op_{layer}"
            in_dim = self.input_dim if layer == 0 else self.hidden_dim
            out_dim = (self.output_dim if layer == self.layer_number - 1 else self.hidden_dim)
            op_candidates = [
                op(in_dim, out_dim)
                if isinstance(op, type)
                else gnn_map(op, in_dim, out_dim)
                for op in self.ops
            ]
            self.setLayerChoice(layer, op_candidates, key=key)

    def forward(self, data):
        x = BK.feat(data)
        for layer in range(self.layer_number):
            op = getattr(self, f"op_{layer}")
            x = BK.gconv(op, data, x)
            if layer != self.layer_number - 1:
                x = F.leaky_relu(x)
                x = F.dropout(x, p=self.dropout, training=self.training)
        return F.log_softmax(x, dim=1)

Performance Estimator
---------------------

The performance estimator estimates the performance of an architecture. Currently, we support the following estimators:

+-------------------------+-------------------------------------------------------+
| Estimator               | Description                                           |
+=========================+=======================================================+
| ``oneshot``             | Directly evaluating the given models without training |
+-------------------------+-------------------------------------------------------+
| ``scratch``             | Train the models from scratch and then evaluate them  |
+-------------------------+-------------------------------------------------------+

You can also write your own estimator. Here is an example of estimating an architecture without training (used in one-shot space).

.. code-block:: python

    # For example, create a NAS estimator by yourself
    from autogllight.nas.estimator.base import BaseEstimator
    class YourOneShotEstimator(BaseEstimator):
        # The only thing you should do is defining ``infer`` function
        def infer(self, model: BaseSpace, dataset, mask="train"):
            device = next(model.parameters()).device
            dset = dataset[0].to(device)
            # Forward the architecture
            pred = model(dset)[getattr(dset, f"{mask}_mask")]
            y = dset.y[getattr(dset, f'{mask}_mask')]
            # Use default loss function and metrics to evaluate the architecture
            loss = getattr(F, self.loss_f)(pred, y)
            probs = F.softmax(pred, dim = 1)
            metrics = [eva.evaluate(probs, y) for eva in self.evaluation]
            return metrics, loss

Search Strategy
---------------

The space strategy defines how to find an architecture. We currently support the following search strategies:RandomSearch, Darts, RL, GraphNasRL, Enas, Spos, GRNA, and Gasso.


Sample-based strategy without weight sharing is simpler than strategies with weight sharing.
We show how to define your strategy here with DFS as an example.

.. code-block:: python

    from autogllight.nas.algorithm.base import BaseNAS
    class RandomSearch(BaseNAS):
        # Get the number of samples at initialization
        def __init__(self, n_sample):
            super().__init__()
            self.n_sample = n_sample

        # The key process in the NAS algorithm is, search for an architecture given space, dataset and estimator
        def search(self, space: BaseSpace, dset, estimator):
            self.estimator=estimator
            self.dataset=dset
            self.space=space
                
            self.nas_modules = []
            k2o = get_module_order(self.space)
            # collect all mutables in the space
            replace_layer_choice(self.space, PathSamplingLayerChoice, self.nas_modules)
            replace_input_choice(self.space, PathSamplingInputChoice, self.nas_modules)
            # sort all mutables with given orders
            self.nas_modules = sort_replaced_module(k2o, self.nas_modules) 
            # get a dict containing all choices
            selection_range={}
            for k,v in self.nas_modules:
                selection_range[k]=len(v)
            self.selection_dict=selection_range
                
            arch_perfs=[]
            # define DFS process
            self.selection = {}
            last_k = list(self.selection_dict.keys())[-1]
            def dfs():
                for k,v in self.selection_dict.items():
                    if not k in self.selection:
                        for i in range(v):
                            self.selection[k] = i
                            if k == last_k:
                                # evaluate an architecture
                                self.arch=space.parse_model(self.selection,self.device)
                                metric,loss=self._infer(mask='val')
                                arch_perfs.append([metric, self.selection.copy()])
                            else:
                                dfs()
                        del self.selection[k]
                        break
            dfs()

            # get the architecture with the best performance
            selection=arch_perfs[np.argmax([x[0] for x in arch_perfs])][1]
            arch=space.parse_model(selection,self.device)
            return arch 

Different search strategies should be combined with different search spaces and estimators in usage. Most search spaces, search strategies, and estimators are compatible.