There are a couple ways to do this, both require the use of dependencies. A dependency is expressed for an operation using the
Here’s a simple case that defines a dependency on a single file:
- file: data.csv
You can change the path to this file when running the operation this way:
guild run train file=alt-data.csv
You can name the dependency this way:
- name: data
This lets you set the path the
dataname (flag-like assignment):
guild run train data=alt-data.csv
You can also specify dependencies on operations. For example, you might prepare a data set by augmenting it, transforming, saving in an optimized format, etc. Your Guild file might then look like this:
- operation: prepare-data
- file: data.csv
When you run
train, Guild looks for a previous run of
prepare-data to satisfy the dependency. If it can’t find one, it quits with an error message.
By default, Guild picks the latest non-error (completed or terminated) run matching the specified operation name. You can override this by specifying the full ID for the run you want to use.
guild run train prepare-data=<some run ID>
You can use regular expression patterns for your operation names to specify multiple operation dependencies. For example:
- file: data-1.csv
- file: data-2.csv
- operation: prepare-data-.*
This is how use different data sets with different training runs.
As for different models, Guild supports “model” definitions in the full Guild file format. E.g.
- model: tree
- model: mlp
You can define dependencies the same way for these operations.
But a “model” in this case is just a namespace for operations — Guild doesn’t know anything about models. You could just as easily defined four operations:
Guild is really just a task runner in this respect. It doesn’t know or care anything about what you’re running. It just knows what to run and how. And it records what was run and its results. And it provides various tools to examine and work with these results. That’s all Guild does.