Use with slurm

Using with slurm

I have been using guild with slurm directly but currently there’s a problem when calling guild compare (the NAS were mounted at /home so the .guild folder is also passed to the child node), it cannot grab the pid of process running in the child node so it displays them as terminated and is slightly inconvinient.

I was looking at remote setting but I think remote setting will pass the job to a remote server to do instead of invoking it locally. I wonder if anyone have a good solution to this?

This won’t be easy to solve without making a change to Guild itself. Guild needs to know that a run is managed by another host/node. This is going to be an issue with any shared volume.

Could you open an issue on GitHub for this? That way we can formally track its progress.

Sure thing. Though I think currently its doesn’t affect use much, it will be great if guild could have more affinity to the slurm api.

Slurm is on the road map and likely the next remote target. I’ll update here with progress on this.

3 Likes