Anyone using Jupyter Notebooks?

I realize this is a Guild AI community and so the answer to this might very well be 100% NO :slight_smile:

Jupyter Notebooks are extremely popular. That means that a lot of data science work is conducted in Notebooks. Unfortunately, the ad hoc workflow that make Notebooks so popular doesn’t lend itself to strictly controlled experiments.

I think Notebooks actually are pretty good for experiments, if used correctly. Unfortunately, it’s quite hard to use them correctly.

To properly represent an experiment, a Notebook:

  • Must be copied to a new, pristine file
  • If flags are changed, must be modified to reflect the experiment-specific flag values
  • Run from top to bottom by a script, not a human
  • Once run, must not be re-run

The copied-and-run Notebook can then be used as a reliable experiment artifact.

0.7.1 has preliminary support for this. It lets you run a ipynb file directly.

guild run my-notebook.ipynb

You can also use a new notebook operation attribute in a Guild file:

# guild.yml

op:
  notebook: my-notebook.ipynb

If you’re using Notebooks — or working with folks who are — I’d love get your input on this feature.

1 Like

We are using notebooks but only for experimentation/exploration. We have a custom guild op that lets us use runs from guild and explore models etc.

It exists in multiple versions but this is one of the iterations

import argparse
import subprocess
import os
from pathlib import Path


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--port", type=int, default=8888)
    args = parser.parse_args()

    working_dir = f"{os.environ['GUILD_HOME']}/runs/{os.environ['RUN_ID']}"
    command = f"jupyter notebook --port={args.port}"
    os.environ["PYTHONPATH"] += os.environ["PROJECT_DIR"] + ":"

    if "PYTHONSTARTUP" in os.environ and Path(os.environ["PYTHONSTARTUP"]).exists():
        python_startup_script = (
            Path(os.environ["PYTHONSTARTUP"]).open("r").read() + "\n"
        )
    else:
        python_startup_script = ""

    Path("change-dir.py").write_text(
        f"{python_startup_script}import os\nos.chdir('{working_dir}')"
    )

    os.environ["PYTHONSTARTUP"] = f"{os.getcwd()}/change-dir.py"

    subprocess.run(command, shell=True)

We also looked at running experiments where the “main” file was a notebook but it didn’t stick

That’s an interesting script. How do the notebooks-of-interest end up in the run directory? Is that by way of upstream dependencies?

This reminds me of a guild open ... scenario, where you want to view a run with a particular program. In this case, the program is jupyter notebook:

guild open <run ID> --cmd 'jupyter notebook'

You can use the --path option to open a specific notebook (Guild’s command completion comes in handy for this as it looks into the specified run directory). E.g.

guild open --cmd 'jupyter notebook --port 5555' --path Untitles.ipynb

We keep the notebooks in the project and sometimes commit them with the source code. We only use guild because we want to get the resource resolution and keep track of artifacts. There’s quite likely a better solution where we can use guild’s resource resolution in a less involved way (and we can skip having a separate run for artifacts because the use case isn’t that common)