Batch eval runs over batch train runs via required resource?


I was curious if there’s an idiomatic Guild way to run a batch of runs that are dependent on a batch of prior runs. To be concrete, suppose I run a batch of train runs. Also suppose I have an evaluation operation called eval_op that requires a train run (so the model checkpoint can be loaded in). It would be neat if something like this worked:

guild run eval_op train=[<run id 1>, <run id 2>, ...]

However, instead I have to type the runs out individually with each run id and also lose out on keeping it organized by batching them together. Full disclosure on context here: I did some hyperparameter optimization and I want to do some additional evaluation over each model.

Perhaps I missed something in the docs! Or if not, perhaps this could be a convenient feature?

Ah to answer my own question here, I missed the batch file capability when going thru the docs: Runs

The original syntax you specified is correct. In that case Guild will run eval_op once for each specified run ID. The only issue I see there is the use of spaces within the square brackets.

Something like this will do what you want, where the entries for the train flag list value are run IDs, or partial run IDs:

guild run eval_op train=[abc,def,fed,cba]

You would have to copy/paste the specific run IDs for this.

There’s a feature enhancement coming up that will let you run something like this:

guild run eval_op train=`guild select -Fo train --started today` # future syntax

The idea here is that select returns a list of run IDs that match the spec, which are used in turn to drive the batch of eval ops.

You can as you point out use batch files also.

A slightly weirder approach is to stage the operations that you want to run using the --stage option and use a queue to run them when reading.

guild run eval_op train=abc --stage  # repeat for each train run

Start a queue to run these in series and exit when complete:

guild run queue run-once=yes

Ah! I bet I did put a space. I’ll try again next time. I did end up instead using the queue solution combined with staging as you described. Thank you!

If you want to include spaces, just quote the argument:

guild run eval_op train='[abc, def, ...]'
1 Like