Guild supports training on remote system by way of a remote facility.
- Define a remote in user configuration
- Specify the remote name using the
--remoteoption when running an operation
For a complete reference on remote configuration, see Remotes Reference.
Define a Remote
Remotes are defined in user configuration. Below is an example of an SSH remote named
remotes: remote-gpu: type: ssh host: gpu001.mydomain.com user: ubuntu private-key: ~/.ssh/gpu001.pem
Guild supports the following remote types:
|ssh||Connect to a remote server over SSH. Use this type to train on remote servers on-premises or on any cloud vendor. Guild does not support support starting of
|ec2||Connect to a remote EC2 host over SSH. This remote type supports
|s3||Copy runs to and from S3. This remote type does not support runs but can be used for backup and restore.|
For a complete list of remote types, including examples, see Remotes Reference.
Remotes can be listed, checked for status, and, if supported by the remote type, started and stopped.
Remote management commands:
||List available remotes.|
||Show status for a remote.|
||Start a remote. Not all remote types can be started.|
||Stop a remote. Not all remote types can be stopped.|
A remote must be available before it can be used in a remote command. Check a remote using
guild remote status. If a remote is not available and can be started, use
guild remote start to start it first. Note that some remote types cannot be started or stopped. Refer to Remotes Reference for detail on each remote type.
To run apply a command to a remote, use the
--remote option. For example, to run
guild check on a remote named
remote-gpu (see example above), run:
guild --remote remote-gpu check
Not all remote types support every command. For example, the
s3 remote type does not support the
run command. Refer to Remotes Reference for details on which remote commands are support for a particular remote type.
Guild commands that support remotes:
||Check Guild on the remote|
||Run an operation on a remote|
||Stop runs in progress on a remote|
||Connect to a remote run in progress and watch its output|
||List runs on a remote|
||Show information about a remote run|
||List remote run files|
||Diff remote runs|
||Show remote run file or output|
||Apply a label to one or more remote runs|
||Delete remote runs|
||Restore deleted remote runs on a remote|
||Purge deleted remote runs on a remote|
||Copy remote runs to the local environment|
||Copy local runs to the remote|