Running cases in parallel

Is there a way to run a batch in parallel? I wrote a function that I have been using with guild, but it is taking a long time to run so I would like to run a massive batch in parallel. Something like this:

def func(case):
tf.config.threading.set_intra_op_parallelism_threads(3)
tf.config.threading.set_inter_op_parallelism_threads(3)

return data

pool = mp.Pool(processes=4)
datas = pool.map(func,cases)

Ideally the cases here would come from my batch, and then Guild would save the results of each case to its own folder. Is there any way to do this currently?

You can use queues to do this.

Start by staging the batch trials:

guild run <your batch spec> --stage-trials

Next, start N queues in the background. Each queue will run the next available staged trial. E.g. to start 10 queues in the background:

for _ in `seq 10`; do guild run queue --background -y; done

You can monitor the runs with guild runs. You should see ~10 running operations at any given time.

Note that the queues continue to run after the trials are completed. They’ll start and subsequently staged trials. You can stop the queues by running:

guild stop -o queue

Queues are like any other run and so will show up in your runs list. You can delete these by running:

guild runs rm -o queue

This is admittedly a bit painful and simpler support for parallel runs is on the near-term road map (Guild 0.8). E.g. guild run <batch spec> --threads 10 (or similar). Until then, queues are your best bet.

1 Like