Assigning runs to GPUs

I am working on a system with 3 GPUs. I have a set of 14 runs that I want to queue up. I want to assign each run to a single GPU, so that at any given time, there are only 3 runs active, and once a GPU has finished processing a single run, the next run in the queue will be loaded.

Is what I am asking for possible, and what is the best way to go about it? I have tried both queue’s and dask, and can’t seem to do what I am suggesting. I keep getting runs assigned across GPU’s.

I use a very similar workflow, check out my reply here Distributing runs on a multi-gpu machine - #3 by mtmccann and let me know if it works for you.

Hi,

Thanks for your response! I did see that posting. Maybe I implemented it incorrectly, could I please get some more details on what you did?

First I created the queues using your code-
for i in {1,2,3}; do guild run --background --yes queue gpus=$i; done

Now when I run a job, do I still need to specify a GPU? I did not, and saw that my runs were still getting distributed across GPU’s. I ran this-
guild run --stage-trials train.py

And I ran it separately for multiple cases.( I have multiple flags which differ between each run so I can’t just create a batch in one line). But my runs were getting split across GPU’s. Do I need to explicitly assign each job to a GPU? If so, doesn’t that defeat the benefit of the queue, where I just want any of my queued up jobs to be taken up by the next free GPU?

I’m not sure what you mean by “my runs were still getting distributed across GPU’s”. Does that mean one run was using multiple GPUs, or that each run was on a different GPU?

I would recommend using standard queues with gpus set for each of your GPUs:

guild run queue gpus=[0,1,2] --background

Here’s a script I used to check the behavior:

# test_cuda.py

import os
import random
import time

from torch import cuda

id = None
cuda_timeout = 10

if id is not None:
    print(f"test_cuda {id}")

print(f"CUDA_VISIBLE_DEVICES: {os.getenv('CUDA_VISIBLE_DEVICES', '<unset>')}")

timeout = time.time() + cuda_timeout
cur = None
while time.time() < timeout:
    try:
        cur = cuda.current_device();
    except RuntimeError:
        time.sleep(1)
    else:
        break
if cur is None:
    raise SystemExit("ERROR: CUDA not available")

for i in range(cuda.device_count()):
    name = cuda.get_device_name(i)
    print(f"  {i}: {name} {' *' if i == cur else ''}")

# Simulate work
time.sleep(10 + random.random() * 30)

To test this, I ran this command (with the queues running as per above):

guild run test_cuda.py id=range[1:17] --stage-trials

Each runs to completion on a particular GPU, as provided by the queue that runs the staged operation. You can monitor the runs using guild runs -Fo test_cuda.py.

To observe what’s going on from the queue standpoint, start each queue in a separate terminal, each with its own gpus flag (i.e. 0, 1, 2) and omit --background. There you can see what runs each queue picks up and starts.

You can check the output of any of these runs using guild cat --output <run>.

You should see that each test run sees only the GPU that is associated with the queue that runs it. This is how you can serialize runs for various GPUs. The trick is the setting of CUDA_VISIBLE_DEVICES by the queue according to its gpus flag. It’s that simple - nothing fancier.

If this test doesn’t work for you, post the behavior you’re seeing.