There are a couple ways to do this, both require the use of dependencies. A dependency is expressed for an operation using the requires
attribute.
Here’s a simple case that defines a dependency on a single file:
train:
requires:
- file: data.csv
You can change the path to this file when running the operation this way:
guild run train file=alt-data.csv
You can name the dependency this way:
train:
requires:
- name: data
file: data.csv
This lets you set the path the data
name (flag-like assignment):
guild run train data=alt-data.csv
You can also specify dependencies on operations. For example, you might prepare a data set by augmenting it, transforming, saving in an optimized format, etc. Your Guild file might then look like this:
train:
requires:
- operation: prepare-data
prepare-data:
requires:
- file: data.csv
When you run train
, Guild looks for a previous run of prepare-data
to satisfy the dependency. If it can’t find one, it quits with an error message.
By default, Guild picks the latest non-error (completed or terminated) run matching the specified operation name. You can override this by specifying the full ID for the run you want to use.
guild run train prepare-data=<some run ID>
You can use regular expression patterns for your operation names to specify multiple operation dependencies. For example:
prepare-data-1:
requires:
- file: data-1.csv
prepare-data2:
requires:
- file: data-2.csv
train:
requires:
- operation: prepare-data-.*
name: data
This is how use different data sets with different training runs.
As for different models, Guild supports “model” definitions in the full Guild file format. E.g.
- model: tree
operations:
train: tree.train
test: tree.test
- model: mlp
operations:
train: mlp.train
test: mlp.rest
You can define dependencies the same way for these operations.
But a “model” in this case is just a namespace for operations — Guild doesn’t know anything about models. You could just as easily defined four operations: tree-train
, tree-test
, mlp-train
, mlp-test
.
Guild is really just a task runner in this respect. It doesn’t know or care anything about what you’re running. It just knows what to run and how. And it records what was run and its results. And it provides various tools to examine and work with these results. That’s all Guild does.