Dask Scheduler not utilizing all available resources

Hey all,

I’ve been trying to get the dask scheduler to work with my guild runs. Let’s say I have 2 GPUs and I’d like to put at max, 2 runs on each GPU.

According to the guides (Parallel processing with Dask scheduler) I’ve implemented two sets of commands. The first set is two commands to stage the set of runs across the 2 gpus. These commands look something like this:

guild run model:train   param1_to_sweep=[10, 20, 30]
                        param2_to_sweep=[1,2,3,4,5]
                        --label my_hp_runs
                        --optimizer random
                        --trials 5
                        --tag dask:GPU0=1
                        --stage-trials
                        --gpus 0

guild run model:train   param1_to_sweep=[10, 20, 30]
                        param2_to_sweep=[1,2,3,4,5]
                        --label my_hp_runs
                        --optimizer random
                        --trials 5
                        --tag dask:GPU1=1
                        --stage-trials
                        --gpus 1

After the runs are staged, I spin up the scheduler with the associated resources that would allow 2 runs per GPU.

guild run dask:scheduler run-once=yes
                         workers=10
                         resources='GPU0=2 GPU1=2'
                         dashboard-address=8890

I’m able to open up the dashboard and can see GPU0 and GPU1 resources avalible. The trouble happens when I see dask only puts 2 runs of GPU0 and none on GPU1. To be more clear, about the schedule of runs that actually happen I’ll label each of the staged runs below:

run id  0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
----------------------------------------------
GPU     0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1

Assuming all the runs take the same amount of time the runs get ran in the following ‘sets’

Set  | run ids
---------------
0    | 0,1 - The initial two runs get put on GPU0
1    | 2,3 - GPU0 runs 0 and 1 ended, so 2 more get put on
2    | 4, 5, 6 - GPU0 runs 2,3 end and there is only one GPU0 run left that gets executed (4). Then 2 runs go on GPU2
3    | 7, 8 - Similar behavior to set 1
4    | 9   - The remaining runs in the GPU2 queue. 

The expected behavior would be the following sets

Set  | run ids
---------------
0    | 0,1,5,6 - 2 runs on each gpu
1    | 2,3,7,8 - 2 runs on each gpu
2    | 4,9 - 1 run on each gpu, queue finishes

Does anyone know why this could be happening or have I just implemented the dask scheduler wrong?