This example shows how to use Guild to track experiments and optimize a TensorFlow model. It highlights the use of TensorBoard and the HParams plugin to evaluate hyperparameters and find optimal values. It uses unmodified code from the official example in TensorFlow Overview.
|guild.yml||Project Guild file|
|beginner_with_flags.py||Sample code modified to expose flags|
|requirements.txt||List of required libraries|
This example follows the process outlines in Use Guild in a Project.
Create Virtual Environment
Activate the environment:
Run the Sample Script with Python
Before adding Guild support, verify that you can run the beginner example without errors.
The command should run to completion after training a model over 5 epochs. If you see errors, resolve them first. If you need help, let us know and we’ll help.
Run the Sample Script with Guild
beginner.py with Guild:
guild run beginner.py
Guild runs the script to generate a run. When the operation is finished, show the run info:
guild runs info
By default shows information for the latest run.
Note the model
loss reflected in the result.
See Runs for commands you can use with runs.
Highlight Guild lets you run and track experiments with zero code change.
The following script parameters should be exposed as flags:
- Training epochs
- Learning rate
- Hidden layer activation
We modify the script to use global variables to define these values.
With this simple change, you can use Guild to run experiments with different hyperparameters. Each run is recorded with the applicable set of flag values.
guild run beginner_with_flags.py epochs=10
Use Guild to search for optimial hyperparameters. By default, Guild tries to minimize the
loss scalar. The sample script happens to log that scalar. If an operation logs something else, specify the scalar to optimize using
Start a run to find optimal values for
dropout. Train over two epochs to save time.
guild run beginner_with_flags.py --optimize \ epochs=2 \ dropout=range[0.1:0.9:0.1] \ learning_rate=loguniform[1e-4:1e-1]
For more information about this command, see Hyperparameter Optimization.
By default Guild runs 20 trials. Specify a different value using
guild runs to list the runs:
By default Guild shows the latest 20 runs. To show all runs, use the
Use TensorBoard to compare runs:
Click HPARAMS and PARALLEL COORDINATES VIEW. Select Logarithmic for learning_rate. Select Quantile for accuracy, loss, and time. This is a useful view for evaluating hyperparameters. Note runs with high accuracy and short run times. These are the “best” runs. To highlight these, click-and-drag along the vertical axis to select a region. Adjust the region as needed. TensorBoard highlights runs that fall within the selected range.
With these results, we make a some observations:
- This model learns quickly on the data set. We achieve solid performance with only two epochs.
- Optimal dropout appears to be around 10% at least over the short training period. We could experiment with higher dropout rates over longer runs.
- Optimal learning rate appears to fall between 0.001 and 0.01. This is with two epochs. We can expect the optimal value to change as we increase training.
- We can match the default performance from the Google example with just two training epochs. This reduces our time and energy cost by 60%.
You can expect break-through observations with other models. This is the value of experiment tracking.
With a base line to compare against, you might explore these questions:
- Can we improve the performance of the model with more training? We can test this by increasing
epochswith the current optimal values for dropout and learning rate. We can run more optimization trials to see if the optimal values hold.
- Can we improve validation accuracy with more data augmentation?
- Do we need dropout? We didn’t explore 0% but we should.
- Do higher levels of dropout show improved results with more training?
Experiments prompt questions, which prompt more experiments.
Add a Guild File
The operation runs the
beginner_with_flags Python module. It provides a description and default flag values.
List the project operations:
train Train a simple neural network to classify MNIST digits
Show help for the project:
OVERVIEW You are viewing help for operations defined in the current directory. To run an operation use 'guild run OPERATION' where OPERATION is one of options listed below. If an operation is associated with a model, include the model name as MODEL:OPERATION. To list available operations, run 'guild operations'. Set operation flags using 'FLAG=VALUE' arguments to the run command. Refer to the operations below for a list of supported flags. For more information on running operations, try 'guild run --help'. For general information, try 'guild --help'. BASE OPERATIONS train Train a simple neural network to classify MNIST digits Flags: activation (default is relu) dropout (default is 0.1) epochs (default is 2) learning_rate (default is 0.002)
Guild files document project capabilities, as well as enable them.
Run the operation:
Guild trains the model using the optimal hyperparameter values. Compare the results to earlier runs:
Use arrow keys to navigate the list. Move to the accuracy column. The accuracy of the latest run — the run at the top of the listing — should rank among the best results.
In this example you train a standard TensorFlow example. The original code remains essentially unchanged. You improve the code with variables that define otherwise hard-coded hyperparameters. You don’t import or use Guild modules. Instead you augment the project with a Guild file. This is all you need to enable a host of features.
For a more detailed step-by-step tutorial, see Get Started with Guidl AI. If you’re already familiar with core Guild features (you learned a lot already in this example), skip to Use Guild in a Project for help applying Guild to your work.