Hi,
my institution has recently changed the configuration of our remote workstations.
Now the connection goes through a jump host, and we cannot use a ssh pair here. I have a proxy configured, so manually I connect to the workstation with ‘ssh [workstation]’. The jump host requires a password on every connection, followed by an app authentication. The workstation has a ssh pairing with my local machine, so I only have to login to the jump host. That’s the policy, and cannot be changed.
I have successfully manged to run a guild check on that remote. I have configured a training script, config files etc. so that it all runs smoothly locally.
However, when I try to run the train operation on the remote, I get the following errors:
Initializing remote run
Password:
Copying package
Password:
Connection timed out during banner exchange
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(235) [sender=3.1.3]
Traceback (most recent call last):
File "/home/bleporowski/anaconda3/envs/marvel/bin/guild", line 8, in <module>
sys.exit(main())
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/main_bootstrap.py", line 40, in main
_main()
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/main_bootstrap.py", line 66, in _main
guild.main.main()
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/main.py", line 33, in main
main_cmd.main(standalone_mode=False)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/click_util.py", line 213, in fn
return fn0(*(args + (Args(**kw),)))
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run.py", line 649, in run
run_impl.main(args)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run_impl.py", line 1514, in main
_dispatch_op(S)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run_impl.py", line 1610, in _dispatch_op
_dispatch_op_cmd(S)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run_impl.py", line 1797, in _dispatch_op_cmd
_confirm_and_run(S)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run_impl.py", line 1874, in _confirm_and_run
_run(S)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run_impl.py", line 2075, in _run
_run_remote(S)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/run_impl.py", line 2082, in _run_remote
remote_impl_support.run(_remote_args(S))
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/commands/remote_impl_support.py", line 125, in run
run_id = remote.run_op(**_run_kw(args))
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/remotes/ssh.py", line 243, in run_op
remote_run_dir = self._init_remote_run(tmp.path, opspec, restart)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/remotes/ssh.py", line 265, in _init_remote_run
self._copy_package_dist(package_dist_dir, remote_run_dir)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/remotes/ssh.py", line 330, in _copy_package_dist
ssh_util.rsync_copy_to(
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/site-packages/guild/remotes/ssh_util.py", line 129, in rsync_copy_to
subprocess.check_call(cmd)
File "/home/bleporowski/anaconda3/envs/marvel/lib/python3.8/subprocess.py", line 364, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['rsync', '-vr', '-e', "ssh -oConnectTimeout=10 -o 'ProxyCommand ssh -oConnectTimeout=100 -W %h:%p [user]@[jumphost]'", '/tmp/guild-remote-stage-ahx9az7p/', '[user]@[workstation]:~/anaconda3/envs/time-gop/.guild/runs/5d5d24d410c648f897630ef102538a1e/.guild/job-packages/']' returned non-zero exit status 255.
I’m curious about two things:
- Why do I have to login twice, once after ‘Initializing remote run’ log, and then again after ‘Copying package’ log?
- I have set up my remotes in the guild/config.yml to have a timeout of 100 seconds for both the jump host and the second step connection. However, from the trace it seems that the guild/config.yml timeout is not properly read?
This is the guild/config.yml:
remotes:
[remote-name]:
type: ssh
host: [workstation]
proxy: ssh -oConnectTimeout=100 -W %h:%p [user]@[jump host]
connect-time: 100
user: [user]
conda-env: ~/anaconda3/envs/time-gop
init: source ~/anaconda3/etc/profile.d/conda.sh | guild -H ~/projects/protime-gop
the obvious reason would be that the connection times out, as per the error log. However, the config timeout value doesn’t seem to actually change the value invoked with the remote command.
Have I made a mistake while creating my guild/config.yml? Or it is a bug? Or maybe some other reason?