Batch eval runs over batch train runs via required resource?

ghwatson · February 3, 2021, 7:53pm

Hello!

I was curious if there’s an idiomatic Guild way to run a batch of runs that are dependent on a batch of prior runs. To be concrete, suppose I run a batch of train runs. Also suppose I have an evaluation operation called eval_op that requires a train run (so the model checkpoint can be loaded in). It would be neat if something like this worked:

guild run eval_op train=[<run id 1>, <run id 2>, ...]

However, instead I have to type the runs out individually with each run id and also lose out on keeping it organized by batching them together. Full disclosure on context here: I did some hyperparameter optimization and I want to do some additional evaluation over each model.

Perhaps I missed something in the docs! Or if not, perhaps this could be a convenient feature?

ghwatson · February 3, 2021, 10:56pm

Ah to answer my own question here, I missed the batch file capability when going thru the docs: Runs

garrett · February 3, 2021, 11:44pm

The original syntax you specified is correct. In that case Guild will run eval_op once for each specified run ID. The only issue I see there is the use of spaces within the square brackets.

Something like this will do what you want, where the entries for the train flag list value are run IDs, or partial run IDs:

guild run eval_op train=[abc,def,fed,cba]

You would have to copy/paste the specific run IDs for this.

There’s a feature enhancement coming up that will let you run something like this:

guild run eval_op train=`guild select -Fo train --started today` # future syntax

The idea here is that select returns a list of run IDs that match the spec, which are used in turn to drive the batch of eval ops.

You can as you point out use batch files also.

A slightly weirder approach is to stage the operations that you want to run using the --stage option and use a queue to run them when reading.

guild run eval_op train=abc --stage  # repeat for each train run

Start a queue to run these in series and exit when complete:

guild run queue run-once=yes

ghwatson · February 3, 2021, 11:57pm

Ah! I bet I did put a space. I’ll try again next time. I did end up instead using the queue solution combined with staging as you described. Thank you!

garrett · February 4, 2021, 12:00am

If you want to include spaces, just quote the argument:

guild run eval_op train='[abc, def, ...]'

Topic		Replies	Views
Pipeline depending on multiple of the same operation Troubleshooting	2	338	July 11, 2022
Hyperparameter Optimization Concepts	0	3264	June 12, 2020
How to have optional run resources General	8	1180	October 5, 2022
Command: run Commands	1	5196	February 16, 2021
How can I define models in guild and run them against different training procedures? General	1	539	March 22, 2022

Batch eval runs over batch train runs via required resource?

Related topics