Notebook as experiment


Jupyter Notebooks (here forth referred to as “Notebooks”) are popular in data science. While a Notebook is not a traditional Python software artifact, can be used to implement steps in an ML pipeline such as data prep, model training, and testing.

Guild supports Notebook use via the guild.ipy interface. The intent of guild.ipy is to support a traditional Notebook interactive developer experience. Runs are generated from Notebook defined functions rather than traditional Python scripts.

While guild.ipy is Notebook friendly, it does not generate true ML experiments.

  • The source code is not strictly managed. The state of the Notebook at the time the run is generated is in flux.
  • The experiment is not reproducible. The user is free to run at any time and in any order without formal tracking.
  • Notebook generated content is not captured.

The feature proposed here addresses these issues.

This proposal is partially implemented in 0.7.1.rc1 and awaiting feedback.


As many developers use Notebooks for ML work, Guild seeks to support useful experiment tracking for Notebooks.


In this proposal, we view executed Notebooks as experiments. This is in contrast to using a Notebook to generate experiments.

Notebooks are composite documents that contain Python code as well as other content including Markdown and images. A Notebook that is fully executed within a new environment represents a single experiment. As such, the Notebook is copied to the run directory and fully executed from top-to-bottom to generate a final artifact. This artifact is captured along with the rest of the project source used in the run.


Flags are supported in this scheme by modifying the run-specific Notebook copy prior to execution. Flag values are configured using one of two methods:

  • If the Notebook contains Google Colab style forms, Guild uses the form parameter specs to identify and assign flag values.

  • Flag defs for a notebook may specify a regular expression pattern used to substitute flag values for. This is used when the Notebook does not define form parameters.


Guild applied output-scalars to the generated cell output to detect scalars.

The Notebook may log scalars to TF event files as well.

Current Implementation

As of 0.7.1.rc Guild supports partial implementation of this feature.

  • You can run ipynb files directly using guild run.
  • Guild file operation supports notebook attribute.
  notebook: op.ipynb
  • Flags support nb-replace to set flag values.

The support in 0.7.1 is considered a minimal viable product (MVP) as it’s possible to set flags and detect scalars logged to TF events. Further work is required to support Google Colab style forms, output scalars, and any other features needed for polished support of “Notebook as experiment”.

Not supported in 0.7.1

  • Google Colab forms
  • Output scalars


Preliminary tests for this feature are here.

The sample Notebooks used in tests are here.

Here’s a sample Guild file that supports a Notebook: