After running guild runs stop X -r server
, the processes are still running on the remote and GPU memory has not been released even though guild reports the run as terminated.
I think it may be pytorch’s data loader worker processes that are still running but I’m not sure. I think I saw something about this subject before but I couldn’t find it here or on github. Has anyone else experienced this issue?