I just ran a test using a long running operation that looks like this:
I start the run this way:
guild run test -r my-remote
I can safely
Ctrl-c the session, which disconnects from the remote operation. I can also explicitly kill the underlying
ssh command. Either way, the run continues on the remote server. Guild only relies on the
ssh connection to start the run — not to actually maintain running. Guild is technically “watching” the run after it starts to avoid the problem you’re mentioning. The watching is just a log tail. You can kill it and not affect the run itself.
Note that when I run this on a remote, the run does not appear in any local runs list until I explicitly pull the run.
When I view the runs on the remote, I see it running — even after I kill the ssh connection.
guild runs -r my-remote
[1:62af7e9e] gpkg.anonymous-cbedc848/test 2020-11-27 12:27:41 running
When I pull the run, I get the current run at the point of the pull. When I list runs locally, I see that it’s running along with the remote name.
guild pull my-remote 62af7e9e
[1:62af7e9e] gpkg.anonymous-cbedc848/test 2020-11-27 12:27:41 running (my-remote)
In this case, Guild reflects the status at the time of the pull. Guild does not automatically sync the status in the background (Guild doesn’t use long-running agents unless you explicitly start them). To get the latest from the remote, run
sync command that’s convenient for sync’ing local runs with their remote counterparts. Unfortunately that command is fubar’d in the 0.7.0 release. That’s fixed for 0.7.1 though. You just run
guild sync and any local runs that are still running are sync’d with the current remote status.
From my end, aside from the broken
sync command (which you don’t need anyway), this is working as expected. To help track down the issue, could you identify the stage where it breaks down for you?