This example shows how Guild is used to run a Jupyter Notebook as an experiment. In this example, Guild runs the notebook externally as a command. This is different from the Interactive Python API example, which runs Guild commands from within a notebook. Both examples are provided in the same project directory.
Project files applicable to this example:
|guild.yml||Project Guild file|
|plot_display_object_visualization.ipynb||Binary classifier notebook|
|requirements.txt||Requirements file for project|
Confirm that Guild is installed by running
See Install Guild AI for instructions if Guild is not already installed.
Clone the examples repository:
git clone https://github.com/guildai/guildai.git
Changes to the
notebooks example directory:
To isolate your work from other Python environments, we recommend that you use a virtual environment for this example.
Enter to create a virtual environment. Activate the environment:
If you prefer to use another method to create a virtual environment (virtualenv, venv, conda, etc.), feel free to use that instead of
guild init. If you use an alternative, install the required Python packages defined in
Show help for the project.
q when you’re done reading the help. This is a good way to become familiar with Guild support for a project.
binary-classifier operation without any flags. This is a baseline experiment. We use a tag to help us identify the run later.
guild run binary-classifier --tag baseline
Review the flag values and press
Enter to run the experiment.
The experiment is implemented by plot_display_object_visualization.ipynb. Guild does the following when it runs the notebook:
- Copy the notebook to the run directory
- Update the copy with run specific flag values
- Execute the notebook in place, from top to bottom, ensuring that results reflect the notebook source and are run in the expected order
- Save any plots or images generated by the notebook as PNG files
- Generate an HTML version of the executed notebook
- Log any scalars output by the notebook
When the run completes, view the run info:
guild runs info
The scalars are values that were printed to cell output by the notebook.
View the runs in TensorBoard:
This shows the plots generated by the notebook. This view comes in handy when you run a notebook several times with different flag values and even source code.
Return to the command prompt and press
Ctrl-c to stop TensorBoard.
Next, open the HTML version of the notebook in your browser:
guild open -p plot_display_object_visualization.html
Scroll down to Create ConfusionMatrixDisplay to see the precision and recall.
The recall is an abysmal 13%! This indicates that the model is generating a number of false negatives, This is because the data set is imbalanced. Without special handling, imbalanced data sets can skew a model’s performance in favor of precision at the expense of recall. The model learns the imbalanced frequencies and “cheats” by favoring the more common classes.
Let’s address the imbalanced data set problem. The
binary-classifier operation exposes a
class_weight argument to the
LogisticRegression constructor. We can use the
class-weight flag to set this argument to “balanced”. This effects the following notebooks code:
clf = make_pipeline( StandardScaler(), LogisticRegression( C=1.0, penalty='l2', multi_class='auto', random_state=0, class_weight=None, # <- class-weight flag value applied here solver='lbfgs', max_iter=100, l1_ratio=None, ), )
Generate a second experiment, this time with an adjustment for class weights. We use another tag to help identify the experiment.
guild run binary-classifier class-weight=balanced --tag balanced
Review the flag values and press
Enter. Guild runs the notebook but this time the training algorithm adjusts for the imbalance in the data set.
When the operation completes, compare the two experiments using
Press the right key until the cursor highlights the
R column. This is the model recall.
Note that the recall has improved from around 13% to 75% — this is a successful experiment! However, we see that model precision, indicated by the
P column, as dropped from 60% to 42%. It’s test accuracy has also dropped.
Diff Notebooks with nbdime
Guild supports a variety of comparison methods across runs. We use
guild compare to compare flags and scalars side-by-side. We use
guild tensorboard to compare images and other summaries.
We can also use locally installed programs to compare runs using
guild diff. Guild supports different programs for different file types. Guild uses nbdime if installed to compare Jupyter notebooks side-by-side.
Diff the notebooks from from the two experiments:
guild diff -p plot_display_object_visualization.ipynb
requirements.txt so it’s available to diff the two notebooks. If
nbdime is not installed, Guild uses the default diff tool configured for your system.
Scroll down the page to see the differences between the notebooks. This is an excellent method to know exactly what changed between notebooks.
Return to the command line and exit the diff by pressing
Next, compare the latest notebook to your working copy — i.e. the copy in the project directory.
guild diff --working -p plot_display_object_visualization.ipynb
Guild shows how the latest run differs from the project version. This is useful when you want to update the project copy with valuable changes captured in historical runs. For example, you might want to change the default behavior to use “balanced” for class weights based on the experiment results.
Guild can run Jupyter Notebooks as experiments in the same way it runs Python scripts. This is useful to ensure consistency between notebook source code and output. As each run contains a separate copy of a notebook, experiment source code and results do not change over as they would with a single copy.
Notebook experiments can be viewed and compared in a variety of ways to be better understand how changes impact results.