Ensemble

We currently support voting and stacking methods.

Voting

A voter essentially constructs a weighted sum of the predictions of base learners. Given an evaluation metric, the weights of base learners are specified in some way to maximize the validation score.

We adopt Rich Caruana’s method for weight specification. This method first finds a collection of (possibly redundant) base learners with equal weights via a greedy search, then specifies the weights in the voter by the number of occurrence in the collection.

You can customize your own weight specification method by overwriting the _specify_weights method.

# An example : use equal weights for all base learners.
class EqualWeightVoting(Voting):
    def _specify_weights(self, predictions, label, feval):
        return np.ones(self.n_models)/self.n_models
        # just allocate the same weight for each base learner

Stacking

A stacker trains a meta-model with the predictions of base learners as input to find an optimal combination of these base learners.

Currently we support generalized linear model (GLM) and gradient boosting model (GBM) as the meta-model.

Create a New Ensembler

You can create your own ensembler by inheriting the base ensembler, and overloading methods fit and ensemble.

# An example : use the currently available best model.
from autogl.module.ensemble.base import BaseEnsembler
import numpy as np
class BestModel(BaseEnsembler):
    def fit(self, predictions, label, identifiers, feval):
        if not isinstance(feval, list):
            feval = [feval]
        scores = np.array([feval[0].evaluate(pred, label) for pred in predictions]) * (1 if feval[0].is_higher_better else -1)
        self.scores = dict(zip(identifiers, scores)) # record validation score of base learners
        ensemble_pred = predictions[np.argmax(scores)]
        return [fx.evaluate(ensemble_pred, label) for fx in feval]

    def ensemble(self, predictions, identifiers):
        best_idx = np.argmax([self.scores[model_name] for model_name in identifiers]) # choose the currently best model in the identifiers
        return predictions[best_idx]