You just trained your model over several epochs. The operation saves checkpoints along the way. It’s tempting to use the latest checkpoint, but that’s not always the most accurate. Given a set of checkpoint, how does a downstream run select the right one?
You can use this technique:
- Save files using names that include the numeric selection criteria.
select-maxwith an operation dependency to select the file using the selection criteria.
Include metrics in generated file names
If your selection criteria for a file is “validation accuracy”, save applicable files with the numeric validation accuracy in the name.
For example, the Keras checkpoint callback saves checkpoints during training. Here’s an example that saves model weights with a file name that includes the applicable epoch and corresponding validation accuracy:
When you run over five epochs, you end up with five files.
By chance in this case the most “accurate” set of weights is epoch 4 not 5. While this is a contrived example, it’s common in training runs for validation accuracy to decline at a certain point.
Select files with min and max patterns
You can select files from an upstream run using
select-max resource source attributes. These apply a pattern to filenames and select a single file name that has the min or max value.
Here’s a downstream operation
test that uses
select-max to select the weights file with the highest accuracy.
test: requires: - operation: train select-max: weights\.\d+-0\.(\d+)\.hdf5
If you want to use a consistent name for the selected file, use
test: requires: - operation: train select-max: weights\.\d+-0\.(\d+)\.hdf5 rename: weights.+\.hdf5 weights.hdf5
Note that in the case of a renamed file, the link is maintained to the original generated file so you can resolve the source.
As a matter of best practices, leave names unchanged and modify your test script to discover the target files by inspecting a directory. Guild does not support passing selected files as flags.
The problem of selecting the “best” file for an operation is hard if you don’t otherwise provide some information about the file. You could resolve all upstream files and rely on the downstream operation to select “best”. That’s a reasonable strategy but it places the burden on the downstream operation. If you know ahead of time the selection criteria values it’s more efficient to encode those values in the generate file names. Then Guild can select “best” by applying a max or min filter using a file name pattern.