Is there a way to run a batch in parallel? I wrote a function that I have been using with guild, but it is taking a long time to run so I would like to run a massive batch in parallel. Something like this:
def func(case):
tf.config.threading.set_intra_op_parallelism_threads(3)
tf.config.threading.set_inter_op_parallelism_threads(3)
…
return data
pool = mp.Pool(processes=4)
datas = pool.map(func,cases)
Ideally the cases here would come from my batch, and then Guild would save the results of each case to its own folder. Is there any way to do this currently?
Next, start N queues in the background. Each queue will run the next available staged trial. E.g. to start 10 queues in the background:
for _ in `seq 10`; do guild run queue --background -y; done
You can monitor the runs with guild runs. You should see ~10 running operations at any given time.
Note that the queues continue to run after the trials are completed. They’ll start and subsequently staged trials. You can stop the queues by running:
guild stop -o queue
Queues are like any other run and so will show up in your runs list. You can delete these by running:
guild runs rm -o queue
This is admittedly a bit painful and simpler support for parallel runs is on the near-term road map (Guild 0.8). E.g. guild run <batch spec> --threads 10 (or similar). Until then, queues are your best bet.