Hi, I have concurrent runs on multiple remotes (say 4 runs on 4 different remote). Is there a way to view the status of all runs from a single interface? I would imagine that guild runs should show them, but for some reason the remote runs appear as “terminated” (while in fact running) and switch to “completed” when they are finished.
I am working with a cluster that has shared drives if that could be taken advantage of.
Guild has an issue when showing “running” status on shared drives where the underlying processes are owned by another system. Unfortunately there’s no good workaround for that, other than using Guild’s remote copy/sync capability. However that’s quite inefficient as it requires whole copies of the runs just to get an accurate status.
I’ll bump the priority on this item as it comes up a fair amount, esp e.g. when working with Slurm and other HPC envs that use shared drives.
I opened a new issue on GitHub to explicitly track the resolution of this problem. If you watch that issue you’ll get updates when it’s resolved.