Multiple runs with different input files, algorithmically -- hopefully using Guild

Hi, I need to do multiple runs of my Keras model with different sets of input data files, by feeding different sets of files to my Keras DataGenerator.

My data files have varying initial filename prefixes within the same directory. So I want to set up runs with each set of files with the same starting filename prefixes… (e.g. all files with filenames starting with prefix=‘AMIGACarb_5S_Aug2011’, and then a run with all files with filenames starting with prefix=‘AMIGACarb_11C_Aug2011’, …).

Would I want to use Guild AI’s Python interface somehow to do this? I was going to write my own code to feed each selection of input files to my Keras DataGenerator – but I wondered how I’d manage that with Guild – and keep track of all the runs in an automated way.

Or would I want to generate my lists of files and then feed those lists into “input file flags” inside a “Guild file”? So I’d have to “build up a Guild file” dynamically that way, perhaps…

Thanks for any thoughts, and thanks for sharing Guild AI! :slight_smile:

You can do this using guild. Consider this example:


- model: my_model
  operations:
    run_keras_training:
      description: "My description"
      main: my_python_script
      label: "my_model:run_keras_training - data_name: ${data_name}"
      flags:
          data_suffix: "5S_Aug2011"
          data_name: "AMIGACarb_"${data_suffix}

Now the data_name will be resolved to AMIGACarb_5S_Aug2011. If you python script takes a CLI argument data_name, you can then load that data file and feed to your Keras DataGenerator.

In order to run multiple of these training runs do this:

guild run my_model:run_keras_training data_suffix='[5S_Aug2011, 11C_Aug2011]' --label MULTIPLE_DATA_RUNS

This will generate a batch experiment with two experiments.

You can now compare your runs easily:

guild compare --label MULTIPLE_DATA_RUNS

Let me know if you have any questions.

Thanks for this example :). My code is currently in a Jupyter Notebook and I know that Guild can execute notebooks just fine.

What I’m wondering is, is there a “Guild way”, or other way, to pass in the CLI “data_suffix” argument that you mention above into my Jupyter Notebook? Normally I think command line arguments are tricky to pass into Jupyter notebooks. I’ve seen ways mentioned on StackOverflow like using the “papermill” Python package, and other ways.

Or maybe I should just generate my different data prefixes (which you called a “suffix”, but it’s really a string “prefix”), in Python, inside the Jupyter Notebook?

I’d just like to be able to use Guild’s “multi-batch” feature to compare runs.

But maybe Guild’s Python interface is the way to go here… I’ll have to think about this. Thanks for any thoughts.

======================

I wonder if this could be used to pass in the “data_suffix” into my .ipynb notebook – from “Jupyter Notebook Experiments”:

" Configuring Notebook Options

You can configure notebook options in a Guild file in the operation notebook attribute. Options are specified using --<option> [<value>] arguments along with the notebook path."

Could I use that with the “data_suffix” to pass each data_suffix value into the notebook?

So I could of course change my Jupyter notebook into a regular python .py script and then your method given for passing in my data file prefixes in a list passed to “guild run” would work fine…

But I like Guild’s Jupyter Notebook features! It’s cool what it will do – especially saving all the plots with the associated output scalars into each copy of the notebook… That would be very useful to me.

So that’s why I’m trying to figure out how I pass into “guild run…” that list of data file prefixes, i.e. with data_prefix=‘[5S_Aug2011, 11C_Aug2011]’ … into my .pynb Jupyter Notebook file.

Did you see this part of the documentation: Jupyter Notebook Experiments?

Guild can pass arguments to a notebook, so it should work fine with what you are doing!