Hyperparameter Optimization


Guild performs hyperparameter optimization by running trials in a batch. A batch is a Guild operation that specializes in running trials. Each type of batch operation generates trials in a different way.

The batch operation is specified for a run using the --optimizer option with guild run. The default optimizer is implicitly specified with --optimize.

As with any Guild operation, you can view help for a batch operation using --help-op with guild run.

guild run gp --help-op

The following table lists built-in batch operations. Define your own batch operation and use it the same way.

Operation Name Description
+ Generate a trial for each unique set of specified flag values
random Generate trials with flag values selected from specified distributions
gp Sequential Bayesian optimization using Gaussian processes
forest Sequential Bayesian optimization using decision trees
gbrt Sequential Bayesian optimization using gradient boosted regression trees

Grid Search

Grid search is performed by Guild’s default batch operation. Perform a grid search to explore a predetermined set of hyperparameters.

The default batch operation is invoked implicitly when any flag value is a list and no other optimizer takes precedence. Another optimizer takes precedence it specified with --optimize or --optimizer or if any flag value is a search space function.

The following command generates sixteen trials, one for each combination of the specified values:

guild run train lr='[0.0001,0.001,0.01,0.1]' dropout='[0.1,0.2,0.3,0.4]'

You can use use sequential flag functions to generate value lists. The following is equivalent to the previous command:

guild run train lr='logspace[-4:-1:4]' dropout=range[0.1:0.4:0.1]

Grid Search and Search Functions

By default, Guild uses the random optimizer when a flag value is a search space function. For example, this command implicitly uses the random optimizer because it uses the loguniform search space function for lr:

guild run train lr=loguniform[1e-4:1e-2] dropout=[0.1,0.2]

This command generates 20 trials. It selects values for dropout at random from the two choices 0.1 and 0.2.

To run a grid search in this case, specify the optimizer as +:

guild run train lr=loguniform[1e-4:1e-2] dropout=[0.1,0.2] --optimizer +

By specifing --optimizer you override Guild’s default behavior. This command generates two runs — one for each value of dropout. It selects random values for lr from the specified distribution.

Random Search

Random search is performed by the random optimizer. The random optimizer is used implicitly if any flag uses a search space function.

By default the random optimizer generates 20 trials. Change this by specifying the --max-trials option with guild run.

The following command uses the random optimizer to run 10 trials. It selects values for lr at random from the log-uniform distribution over the range 1e-4 to 1e-1:

guild run train lr=loguniform[1e-4:1e-1] --max-trials 10

If you specify a list value, random selects values from the list at random from a uniform distribution. See Grid Search and Search Functions above if you want to use grid search with search space functions.

Explicitly use random by specifying --optimizer (or the short form -o) with guild run:

guild run train lr=logspace[-4:-1:4] dropout=range[0.1:0.4:0.1] -o random -m 10

In this case, the random optimizer runs 10 trials, selecting values at random from the choices.

Sequential Optimization

Sequential optimization is performed by a batch operation that use previous trials to suggest hyperparameter values it believes will yield a better result (e.g. lower loss, higher accuracy, etc. Sequential optimizers include gp, forest, and gbrt.

Flag values do not implicitly select sequential optimizers. You must explicitly select an optimizer by specifying --optimize or --optimizer with guild run.

Use -Fo options to set optimizer flags for a batch. The following command sets xi and random-starts for the gp optimizer:

guild run train -o gp -Fo xi=0.1 -Fo random-starts=5

Custom Optimizers

You can use any operation as an optimizer. See Hyperopt Example for a custom optimizer.