RFC: Auto-delete batch runs on success

I’d like to propose a change to Guild: batch runs should be deleted automatically on success.

Guild uses separate operations to run batches. A batch is any run that generates runs. Generated runs are often referred to as trials.

This command generates uses a batch run to generate two trials:

guild run train.py x=[1,2]

After this completes, you see three runs, not two.

guild runs
[1:ecad12a6]  train.py   2021-02-22 18:20:46  completed  noise=0.1 x=2
[2:d3146520]  train.py   2021-02-22 18:20:46  completed  noise=0.1 x=1
[3:ab289970]  train.py+  2021-02-22 18:20:45  completed  

This first run train.py+ is the batch run. Guild uses separate runs batches for various reasons. The most important being that a batch operation can make decisions about how to run trials. Batch operations provide support for grid search, random search, and sequential optimization.

In practice, these batch runs don’t provide much value after they’re completed. I find them mostly annoying. In fact, several Guild commands intentionally ignore these runs.

For the proposed change, Guild would delete these batch runs whenever they complete successfully. Auto-delete batches would include the + operation (used for grid seach) and skopt:random (used for random search). Sequential optimizer runs would not be auto deleted.

As a part of this change, the run command would get a --keep-batch option, which would let you override this auto-delete behavior if needed.

A preview of this change will be available in an upcoming pre-release but I’d like to get thoughts on this proposal in the meantime.

Thanks for your input!

1 Like