Jupyter Notebook Experiments


Use Guild AI to run Jupyter Notebooks as recorded experiments. Guild copies a notebook to a unique directory, modifies the copy with experiment-specific values, and executes the copy in place. The result is a unique record of a notebook whose output reflects the cell source at the time of execution.

Notebooks as Experiments

When recording a data science experiment, it’s important to control and capture a number of elements.

  • Source code order-of-execution
  • File inputs
  • Hyperparameters and other experiment configuration
  • Operation summaries (tables, plots, etc.)
  • Generated files (model checkpoints, data sets, etc.)

When using Jupyter Notebooks, it can be unclear what code ran at what time. Results recorded from notebook cell execution may not accurately reflect experiment results. There are no strict controls when running code within a notebook.

Guild addresses this problem by running notebooks externally.

guild run notebook.ipynb

In this case, Guild generates an experiment, consisting of a copy of the fully executed notebook in a unique directory. Guild ensures that the notebook all cells in the notebook are run from top-to-bottom. The results of each cell are saved with the notebook.

The original notebook is not used to run the experiment. A copy is. This ensures that the experiment run is not overwritten by subsequent runs. It provides an authoritative record of the source code and order of execution.

In addition to running a notebook copy, Guild performs addition tasks:

  • Generate HTML copies of executed notebooks for portability
  • Save notebook plots as external PNG files for cross-run comparison in TensorBoard
  • Log scalars for use in TensorBoard and by sequential hyperparameter optimizers


Each experiment consists of a copy of the original notebook. Guild modifies the copy to reflect different flag values.

Flag values are assigned in one of two ways:

  • Global variable assignment
  • Code search-and-replace

Global Variable Assignments

Guild considers global variables that are assigned numeric, text, or boolean values as flag candidates. Consider the following sample notebooks cell.

# Sample cell in notebook.ipynb
learning_rate = 0.1
epochs = 100
dropout = 0.2

m = Model(dropout)
m.train(epochs, learning_rate)
print("loss: %f" % m.loss)

Guild detects the flags learning_rate, epochs, and dropout as flags.

The following command is used to generate a copy of notebook.ipynb with different values for epochs and dropout:

guild run notebook.ipynb epochs=50 dropout=0.1

Code Search and Replace

In some cases, flags may not be defined as global variables in a notebook. In that case, Guild supports a search-and-replace facility to update source code in the copied notebooks. This feature requires that the notebook is run from a Guild file.

Consider the following notebook cell:

# Sample cell in notebook.ipynb

m = Model(dropout=0.2)
m.train(epochs=100, lr=0.1)
print("loss: %f" % m.loss)

In this case, Guild does not detect any flags because there are no global variables. In this case, you can define flags for the notebook using nb-replace attributes for flag definitions in a Guild file.

# guild.yml, located alongside notebook.ipynb

  notebook: notebook.ipynb
      nb-replace: dropout=([\d\.]+)
      nb-replace: epochs=(\d+)
      nb-replace: lr=([\d\.]+)

Guild uses these patterns to detect flag values and replace them for each run with current flag values.

HTML and Image Exports

When Guild runs a notebook, it generates an HTML copy of the executed notebook. This simplifies the process of viewing a notebook as a user can view the results in a browser rather than running Jupyter.

Guild also saves plots and other notebook generates images as external PNG files. This also simplifies viewing results and makes notebook output comparable in TensorBoard when run using guild tensorboard.


Guild captures and logs scalars output by notebook cells. Scalars can be captured using the operation output-scalars attribute. For more information, see Scalars.


You can run a notebook multiple times, each time with different flag values, using a single Guild command. Any of the search methods can be applied to notebooks this way.

For example, the following command uses Bayesian optimization to find values of learning_rate and dropout that minimize training loss.

guild run notebook.ipynb \
  --optimizer bayesian \
  learning_rate=log-uniform[1e-4:1e-1] \


See Binary Classifier Notebook Example for a step-by-step example of using a notebook to perform a binary prediction task.


Use Guild to accurately track experiments implemented in Jupyter notebooks. By running notebooks externally, you strictly control execution and are assured of accurate output.

By running copies of notebooks, rather than the original notebook, Guild ensures that you have a separate record of each run. Guild also supports different flag values for each run, which lets you experiment with different values without changing the notebooks source.


Nice! We will try this out. We are also trying out a custom operation like guild run jupyter that starts an interactive notebook that can access resources that we specify like a previously trained model or a prepared dataset

1 Like