With the upcoming integration with Slurm, I am interested in a good way of using Guild + Slurm to parallelize heavy computation across data.
Say for an example I have a preprocessing step wrapped in guild that takes a large video file as input and spits out some processed video. For this example, say I have three of such videos. Currently I would do something like this in guild:
guild run model:preprocess video_file_path='[path1, path2, path3]'
Optionally use the dask scheduler to run in parallel on one machine. I imagine I would be able to do the same using Slurm and it will be awesome.
My one issue with this, is that this isn’t really three logically separate experiments, but essentially just one experiment.
I was wondering if there was a way to merge such a batch run into a single experiment? Maybe something like:
guild merge RUN_1 RUN_2 RUN_3
Or something similar. Maybe there is someone out there with a better suggestion on how to handle this.
Is there anyway when specifying a pipeline to have one operation resolve it’s dependency on a batch run instead of an individual run such a batch run has produced?
Essentially I would like to something like this:
`guild run model:preprocess video_file_path=’[path1, path2, path3]’
`guild run model:train model_preprocess_op=<BATCH_RUN_ID>
Above works (only if you use the full RUN ID by the way), but I would like to specify that in a guild pipeline.
There’s a bug in Guild that prevents what I should work from working. I created a GitHub issue to track this.
Here are the details on what I’m seeing:
There’s a workaround in there that might help in your case.
You should be able to specify a batch requirement using <op name>+ in the dependency, but this isn’t working. As a workaround you can use the select command with command substitution.
guild run summary batch=`guild select --operation op+`
Which will create a new experiment with links/copies of the joined runs.
This would make it easy to e.g. add a new experiment to a set of experiments used for some kind of summary operation or in the case of this topic, process more data after the first batch run and then add to the same batch run.