We propose the following changes to the steps support in Guild:
- When restarting a pipeline run, new steps should not be started but rather restarted in place. This should be the equivalent of running
guild run --restart <step run ID>
- When restarting a pipeline, Guild requires an additional
--restart-failedoption to clarity the user intent
Rationale for 2:
- Restarting a pipeline where all steps have completed warrants additional confirmation by the user due to the costly nature of a pipeline and the effects of terminating a pipeline run that was accidentally restarted.
- Restarting a pipeline with failed runs could mean “restart all” or “restart failed” — we want the user to make this explicit.
This is a breaking change. Note however that the current implementation for steps is not useful.
This proposal is under development
See above (TODO move details here).
Alternative options and default values
TODO: Maybe outline different spellings above??
- Changes to
- Guild should avoid creating new runs on restart — this might mean it initializes steps up front on create and reuses those linked dirs on restart (problem of applying flag changes to these runs still exist but would avoid the new step issue we see now)
- Changes to steps in the Guild file are not applied to the run being restarted (how to handle new step defs??)
- Maybe address Guild’s poor handling of operation flag config — i.e. lack of pass through ability