Guild steps and pipeline - reuse same run

garrett · October 28, 2020, 4:01pm

I understand now — thank you for the clarification!

Guild is not really setup to do something like this. In Guild, a run, once completed, is informally considered read-only. Guild does not currently enforce this read-only state, but I think it should. The thinking is that, once a run is completed, it’s set and should not later be changed. Future releases of Guild will likely formally support this via these mechanisms:

Set read-only file status for the run directory and run files
Generate a digest for the read-only run
Support checking a run against the digest to detect changes

These are all important considerations for reproducibility and audability.

However, the patching scenario that you describe is quite common — and generation of a runnable artifact is a good example. Another example might be model compression, quantization, etc.

From Guild’s point of view, these patch operations should be separate runs. This keeps the upstream runs immutable and separates any newly generated artifacts. If the downstream operation is meant to modify an upstream file, it should use a copy dependency and modify its own copy of the upstream file.

upstream: {}     # generates some file foo.txt
downstream:      # compresses foo.txt
  requires:
    - op: upstream
      select: foo.txt
      target-type: copy

In Guild 0.7.x the default target type is link. To copy you need to explicitly use the copy target type as per the example above. This will change in 0.8 so that copy is the default. If you want to link, you’ll need to use link. In that case, the link will be read-only — again, using the rationale above.

Now, all this said, Guild does support a --restart option, which is specifically designed to re-run an operation from within a run directory. This is really intended for use with terminated or error status but works just as well with completed status. The use case this addresses is the common case of restarting a run that stopped early or failed — e.g. to train more or to fix a bug without having to restart a run from scratch.

To your case, I would first consider the Guild approach I describe above, where patches are really just additional runs. Think of this like a copy-on-write file system, where changes are implemented as additional transformations rather than in-place edits. Docker images e.g. work this way.

If you strongly prefer to edit the run files in place, you still need a second run. You can link to the files that you want to modify and then delete the patch run afterward. However, I think this is not ideal. The patch is a meaningful operation, which I think you should record. The second run formally captures the patch operation, including the source code used, flags, results, etc. If you delete this run, you lose that record.

Topic		Replies	Views
Can Guild take advantage of cached results? General	3	671	April 12, 2021
Pipeline depending on multiple of the same operation Troubleshooting	2	338	July 11, 2022
Runs vs storing models General	1	424	November 25, 2020
Using guild for data parallelization General	7	618	May 10, 2022
How can I define models in guild and run them against different training procedures? General	1	539	March 22, 2022

Guild steps and pipeline - reuse same run

Related topics