Guild supports training on remote system by way of a remote facility.

  • Define a remote in user configuration
  • Specify the remote name using the --remote option when running an operation

For a complete reference on remote configuration, see Remotes Reference.

Define a Remote

Remotes are defined in user configuration. Below is an example of an SSH remote named remote-gpu:

    type: ssh
    user: ubuntu
    private-key: ~/.ssh/gpu001.pem

Guild supports the following remote types:

ssh Connect to a remote server over SSH. Use this type to train on remote servers on-premises or on any cloud vendor. Guild does not support support starting of ssh remote types.
ec2 Connect to a remote EC2 host over SSH. This remote type supports start and stop remote commands given EC2 specific configuration for the remote.
s3 Copy runs to and from S3. This remote type does not support runs but can be used for backup and restore.

For a complete list of remote types, including examples, see Remotes Reference.

Manage Remotes

Remotes can be listed, checked for status, and, if supported by the remote type, started and stopped.

Remote management commands:

guild remotes List available remotes.
guild remote status Show status for a remote.
guild remote start Start a remote. Not all remote types can be started.
guild remote stop Stop a remote. Not all remote types can be stopped.

A remote must be available before it can be used in a remote command. Check a remote using guild remote status. If a remote is not available and can be started, use guild remote start to start it first. Note that some remote types cannot be started or stopped. Refer to Remotes Reference for detail on each remote type.

Remote Commands

To run apply a command to a remote, use the --remote option. For example, to run guild check on a remote named remote-gpu (see example above), run:

guild --remote remote-gpu check

Not all remote types support every command. For example, the s3 remote type does not support the run command. Refer to Remotes Reference for details on which remote commands are support for a particular remote type.

Guild commands that support remotes:

check Check Guild on the remote
run Run an operation on a remote
stop Stop runs in progress on a remote
watch Connect to a remote run in progress and watch its output
runs List runs on a remote
runs info Show information about a remote run
ls List remote run files
diff Diff remote runs
cat Show remote run file or output
label Apply a label to one or more remote runs
runs delete Delete remote runs
runs restore Restore deleted remote runs on a remote
runs purge Purge deleted remote runs on a remote
pull Copy remote runs to the local environment
push Copy local runs to the remote