Hyper Parameter Optimization

We support black box hyper parameter optimization in variant search space.

Search Space

Three types of search space are supported, use dict in python to define your search space. For numerical list search space. You can either assign a fixed length for the list, if so, you need not provide cutPara and cutFunc. Or you can let HPO cut the list to a certain length which is dependent on other parameters. You should provide those parameters’ names in curPara and the function to calculate the cut length in “cutFunc”.

# numerical search space:
{
    "parameterName": "xxx",
    "type": "DOUBLE" / "INTEGER",
    "minValue": xx,
    "maxValue": xx,
    "scalingType": "LINEAR" / "LOG"
}

# numerical list search space:
{
    "parameterName": "xxx",
    "type": "NUMERICAL_LIST",
    "numericalType": "DOUBLE" / "INTEGER",
    "length": 3,
    "cutPara": ("para_a", "para_b"),
    "cutFunc": lambda x: x[0] - 1,
    "minValue": [xx,xx,xx],
    "maxValue": [xx,xx,xx],
    "scalingType": "LINEAR" / "LOG"
}

# categorical search space:
{
    "parameterName": xxx,
    "type": "CATEGORICAL"
    "feasiblePoints": [a,b,c]
}

# fixed parameter as search space:
{
    "parameterName": xxx,
    "type": "FIXED",
    "value": xxx
}

How given HPO algorithms support search space is listed as follows:

Algorithm	numerical	numerical list	categorical	fixed
Grid			✓	✓
Random	✓	✓	✓	✓
Anneal	✓	✓	✓	✓
Bayes	✓	✓	✓	✓
TPE [1]	✓	✓	✓	✓
CMAES [2]	✓	✓	✓	✓
MOCMAES [3]	✓	✓	✓	✓
Quasi random [4]	✓	✓	✓	✓
AutoNE [5]	✓	✓	✓	✓

Add Your HPOptimizer

If you want to add your own HPOptimizer, the only thing you should do is finishing optimize function in you HPOptimizer:

# For example, create a random HPO by yourself
import random
from autogl.module.hpo.base import BaseHPOptimizer
class RandomOptimizer(BaseHPOptimizer):
    # Get essential parameters at initialization
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.max_evals = kwargs.get("max_evals", 2)

    # The most important thing you should do is completing optimization function
    def optimize(self, trainer, dataset, time_limit=None, memory_limit=None):
        # 1. Get the search space from trainer.
        space = trainer.hyper_parameter_space + trainer.model.hyper_parameter_space
        # optional: use self._encode_para (in BaseOptimizer) to pretreat the space
        # If you use _encode_para, the NUMERICAL_LIST will be spread to DOUBLE or INTEGER, LOG scaling type will be changed to LINEAR, feasible points in CATEGORICAL will be changed to discrete numbers.
        # You should also use _decode_para to transform the types of parameters back.
        current_space = self._encode_para(space)

        # 2. Define your function to get the performance.
        def fn(dset, para):
            current_trainer = trainer.duplicate_from_hyper_parameter(para)
            current_trainer.train(dset)
            loss, self.is_higher_better = current_trainer.get_valid_score(dset)
            # For convenience, we change the score which is higher better to negative, then we should only minimize the score.
            if self.is_higher_better:
                loss = -loss
            return current_trainer, loss

        # 3. Define the how to get HP suggestions, it should return a parameter dict. You can use history trials to give new suggestions
        def get_random(history_trials):
            hps = {}
            for para in current_space:
                # Because we use _encode_para function before, we should only deal with DOUBLE, INTEGER and DISCRETE
                if para["type"] == "DOUBLE" or para["type"] == "INTEGER":
                    hp = random.random() * (para["maxValue"] - para["minValue"]) + para["minValue"]
                    if para["type"] == "INTEGER":
                        hp = round(hp)
                    hps[para["parameterName"]] = hp
                elif para["type"] == "DISCRETE":
                    feasible_points = para["feasiblePoints"].split(",")
                    hps[para["parameterName"]] = random.choice(feasible_points)
            return hps

        # 4. Run your algorithm. For each turn, get a set of parameters according to history information and evaluate it.
        best_trainer, best_para, best_perf = None, None, None
        self.trials = []
        for i in range(self.max_evals):
            # in this example, we don't need history trails. Since we pass None to history_trails
            new_hp = get_random(None)
            # optional: if you use _encode_para, use _decode_para as well. para_for_trainer undos all transformation in _encode_para, and turns double parameter to interger if needed. para_for_hpo only turns double parameter to interger.
            para_for_trainer, para_for_hpo = self._decode_para(new_hp)
            current_trainer, perf = fn(dataset, para_for_trainer)
            self.trials.append((para_for_hpo, perf))
            if not best_perf or perf < best_perf:
                best_perf = perf
                best_trainer = current_trainer
                best_para = para_for_trainer

        # 5. Return the best trainer and parameter.
        return best_trainer, best_para

[1]	Bergstra, James S., et al. “Algorithms for hyper-parameter optimization.” Advances in neural information processing systems. 2011.

[2]	Arnold, Dirk V., and Nikolaus Hansen. “Active covariance matrix adaptation for the (1+ 1)-CMA-ES.” Proceedings of the 12th annual conference on Genetic and evolutionary computation. 2010.

[3]	Voß, Thomas, Nikolaus Hansen, and Christian Igel. “Improved step size adaptation for the MO-CMA-ES.” Proceedings of the 12th annual conference on Genetic and evolutionary computation. 2010.

[4]	Bratley, Paul, Bennett L. Fox, and Harald Niederreiter. “Programs to generate Niederreiter’s low-discrepancy sequences.” ACM Transactions on Mathematical Software (TOMS) 20.4 (1994): 494-495.

[5]	Tu, Ke, et al. “Autone: Hyperparameter optimization for massive network embedding.” Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019.