Binary Classifier Notebook


This example shows how Guild is used to run a Jupyter Notebook as an experiment. In this example, Guild runs the notebook externally as a command. This is different from the Interactive Python API example, which runs Guild commands from within a notebook. Both examples are provided in the same project directory.

Project files applicable to this example:

guild.yml Project Guild file
plot_display_object_visualization.ipynb Binary classifier notebook
requirements.txt Requirements file for project


Confirm that Guild is installed by running check:

guild check

See Install Guild AI for instructions if Guild is not already installed.

Clone the examples repository:

git clone

Changes to the notebooks example directory:

cd examples/notebooks

To isolate your work from other Python environments, we recommend that you use a virtual environment for this example.

guild init

Press Enter to create a virtual environment. Activate the environment:

source guild-env

If you prefer to use another method to create a virtual environment (virtualenv, venv, conda, etc.), feel free to use that instead of guild init. If you use an alternative, install the required Python packages defined in requirements.txt.

Show help for the project.

guild help

Press q when you’re done reading the help. This is a good way to become familiar with Guild support for a project.

Baseline Experiment

Run the binary-classifier operation without any flags. This is a baseline experiment. We use a tag to help us identify the run later.

guild run binary-classifier --tag baseline

Review the flag values and press Enter to run the experiment.

The experiment is implemented by plot_display_object_visualization.ipynb. Guild does the following when it runs the notebook:

  • Copy the notebook to the run directory
  • Update the copy with run specific flag values
  • Execute the notebook in place, from top to bottom, ensuring that results reflect the notebook source and are run in the expected order
  • Save any plots or images generated by the notebook as PNG files
  • Generate an HTML version of the executed notebook
  • Log any scalars output by the notebook

When the run completes, view the run info:

guild runs info

The scalars are values that were printed to cell output by the notebook.

View the runs in TensorBoard:

guild tensorboard

This shows the plots generated by the notebook. This view comes in handy when you run a notebook several times with different flag values and even source code.

Return to the command prompt and press Ctrl-c to stop TensorBoard.

Next, open the HTML version of the notebook in your browser:

guild open -p plot_display_object_visualization.html

Scroll down to Create ConfusionMatrixDisplay to see the precision and recall.

The recall is an abysmal 13%! This indicates that the model is generating a number of false negatives, This is because the data set is imbalanced. Without special handling, imbalanced data sets can skew a model’s performance in favor of precision at the expense of recall. The model learns the imbalanced frequencies and “cheats” by favoring the more common classes.

Second Experiment

Let’s address the imbalanced data set problem. The binary-classifier operation exposes a class_weight argument to the LogisticRegression constructor. We can use the class-weight flag to set this argument to “balanced”. This effects the following notebooks code:

clf = make_pipeline(
        class_weight=None, # <- class-weight flag value applied here

Generate a second experiment, this time with an adjustment for class weights. We use another tag to help identify the experiment.

guild run binary-classifier class-weight=balanced --tag balanced

Review the flag values and press Enter. Guild runs the notebook but this time the training algorithm adjusts for the imbalance in the data set.

When the operation completes, compare the two experiments using guild compare.

guild compare

Press the right key until the cursor highlights the R column. This is the model recall.

Screenshot from 2021-01-14 11-19-07

Note that the recall has improved from around 13% to 75% — this is a successful experiment! However, we see that model precision, indicated by the P column, as dropped from 60% to 42%. It’s test accuracy has also dropped.

Diff Notebooks with nbdime

Guild supports a variety of comparison methods across runs. We use guild compare to compare flags and scalars side-by-side. We use guild tensorboard to compare images and other summaries.

We can also use locally installed programs to compare runs using guild diff. Guild supports different programs for different file types. Guild uses nbdime if installed to compare Jupyter notebooks side-by-side.

Diff the notebooks from from the two experiments:

guild diff -p plot_display_object_visualization.ipynb

You installed nbdime from requirements.txt so it’s available to diff the two notebooks. If nbdime is not installed, Guild uses the default diff tool configured for your system.

Scroll down the page to see the differences between the notebooks. This is an excellent method to know exactly what changed between notebooks.

Return to the command line and exit the diff by pressing Ctrl-c.

Next, compare the latest notebook to your working copy — i.e. the copy in the project directory.

guild diff --working -p plot_display_object_visualization.ipynb

Guild shows how the latest run differs from the project version. This is useful when you want to update the project copy with valuable changes captured in historical runs. For example, you might want to change the default behavior to use “balanced” for class weights based on the experiment results.


Guild can run Jupyter Notebooks as experiments in the same way it runs Python scripts. This is useful to ensure consistency between notebook source code and output. As each run contains a separate copy of a notebook, experiment source code and results do not change over as they would with a single copy.

Notebook experiments can be viewed and compared in a variety of ways to be better understand how changes impact results.

  • Compare flags and scalars using Guild Compare and Guild View
  • Compare images, scalars, and hyperparameter matrices with TensorBoard
  • View run notebooks as HTML files in standard browsers
  • Diff two notebooks side-by-side using nbdime