Pause run

As the title describes. Is there a way to pause a running run?
I know that it is possible to stop them with guild runs stop, but I would like to pause a run so that I can continue running it later.
Is it possible?
I’m asking because I’m using a shared server and sometimes I need to let some other people use it.

I would use guild runs stop and in your code, handle the interrupt. Guild sends a SIGINT initially, so you’ll get a clear signal to handle. See these docs for more info.

You can then use guild run --restart RUN_ID to restart the run.

In your code, you need to routinely save progress for this to make sense. Most examples of real operations will have this (e.g. routinely writing to and flushing TF event logs, etc.) When your code starts, you need to look at the local files and restore your state accordingly.

Pausing a process really isn’t a mainstream thing - as processes can be killed for any number of reasons (out of memory, power failure, bugs in code, etc.) So when you make progress on a job, write something to disk so that you can start where you left off in the event the process is killed.

While this is a bit of a pain initially, it’s the standard engineering pattern for long running processes. Once you start using this technique it’ll feel natural and weird when you don’t use it.

That’s what I suspected.
Thank you for the answer