Overview
A Guild run is a computation that generates a result you’re interested in saving. We often refer to runs as experiments. Runs can perform any type of task including:
- Train models
- Prepare data sets
- Evaluate and test trained models
- Analyze and summarize sets of runs
- Optimize models size and performance
- Deploy models
The term run refers to one of two things:
- A in-process run, represented by an operating system process
- A file system artifact associated with a run operating system process
Runs play a central role for systematic model improvement. By capturing experiment details, you establish a series of baseline measurements against which to compare future experiments. You maintain a record that can used to check progress and make informed next-steps.
Runs serve as a unit of reproducibility. By automating and capturing experiments in your workflow, you provide a smooth path for others to rerun your work and compare results.
Run Artifacts
Guild saves runs on standard file systems. Guild is different in this respect from experiment tracking systems that save experiment results in databases or exotic file systems…
Runs are stored under a runs
directory located in Guild home. To show where Guild saves runs, refer to the guild_home
attribute shown by guild check
. Each run is saved in a unique subdirectory. For more information, see Run Directory below.
Run Directory
A run directory is a unique directory created for a run process. Each run directory is persistent record of a run, including files generated by the operation script and metadata describing the run.
Runs directories are stored together in a runs
subdirectory of Guild home.
Run directories are named using a Guild-generated unique identifier. This value correponds to the full run ID. Run IDs are globally unique to differentiate them across systems over time.
Generated Files
When Guild starts a run, it changes the current directory to the unique run directory. Relative paths written during the operation are written to the run directory. In this way, a run “captures” script output.
Guild additionally saves run metadata in the run directory. This information is shown when you run guild runs info
.
Resource Links
Guild supports required resources for runs, which are files that an operation needs to run successfully. For example, a model training operation may require a data set prepared by another operation. These dependencies are defined in a Guild file.
Resource links may include:
- Files from other runs
- Downloaded files
- Project files
When preparing a run directory, Guild resolves required resources by locating the appropriate files (downloading them if needed) and creating symbolic links to those files in the run directory. These links are part of the run artifact.
For for more information on defining and using required resources, see Dependencies.
Run Metadata
Run metadata consists of:
- Flags
- Snapshot of the source code used by the operation
- OS process information such as command, environment, process ID, and exit code
- Run output (content written to standard output and standard error during the operation)
Run metadata are stored in files under a Guild-generated .guild
subdirectory of the run directory. You can list these files for a run using guild ls
as follows:
guild ls -a -p .guild [RUN]
Guild provides various commands to show run metadata. See Get Run Information below for details.
Start a Run
Start a run using guild run
. The run
command is a multi-feature command that supports the following actions:
- Run a Python script
- Run an operation defined in a Guild file
- Start a batch of runs
- Stage runs for deferred execution
- Restart a stopped run
- Start a run from a prototype
- Show help for an operation
- Test an operation
Run a Python Script
In Get Started you run a mock training script named train.py
.
guild run train.py
Example of running a Python script directly. train.py
in this case is a file in the current directory
This is often a good place to start because it doesn’t require additional configuration. For more control over a run, see Run an Operation below.
Note While Guild supports other languages, Guild currently only supports direct execution of Python scripts. To run a script in a different language, define an operation. See Run an Operation below for more information.
For more information on Guild’s default behavior, see Default Behavior - Python Scripts.
Tip Guild’s support for directly script execution is a convenience to get started quickly. Without additional information, Guild makes various assumptions that may not hold true for your script. Consider using operations for more control over how Guild runs your script. See Run an Operation below for more information.
Auto-Detect Python Script Flags
To run a Python script without additional configuration, Guild inspects the script to determine script flags, which provide information to the script.
If the script uses argparse
, Guild assumes the script accepts flags as command line options. Otherwise, Guild assumes that the script defined flags as global variables.
For example, Guild detects a single flag learning_rate
from this code:
import argparse
parser = argparse.ArgumentParser()
parser.add_argument("--learning_rate", type=float, default=0.01)
args = pargser.parse_args()
print("learning rate is %f" % args.learning_rate)
Guild detects use of argparse
and assumes that command options are used as flag
Guild detects learning_rate
from this code:
learning_rate = 0.01
print("learning rate is %s" % learning_date)
When argparse
is not used, Guild considers global variables not starting with _
and that are assigned a constant value (string, number, or boolean) as flags
Auto-Detect Python Script Scalars
Guild logs numeric values as scalars that are written by the script using the format:
<key>: <value>
<key>
must not contain spaces and <value>
must be parsable as a number. Output must not contain leading spaces.
For example, the following script implicitly logs the scalars loss
and accuracy
:
loss, accuracy = train_model()
print("loss: %f" % loss)
print("accuracy: %f" % accuracy)
By default, Guild logs numeric values printed as KEY: VALUE
as scalars
This behavior can be modified by defining output-scalars
for an operation in a Guild file. See Scalars for more information.
Run an Operation
Run an operation by specifying the operation name. To show the list of available options for a project, use guild ops
. To include installed operation — i.e. operations defined in installed packaged – use -i
with the ops
command.
With the operation name, start it with guild run
. Refer to the command help for options.
Run a Batch
To run a batch, specify a flag value that implicitly starts a batch run. Start a batch explicitly by specifying --optimize
or --optimizer
with guild run
.
See Batches below for more information.
Stage a Run
You can stage a run to start later. To stage a run, use --stage
with guild run
.
Start a staged run using --start RUN-ID
with run
.
Staged runs are often used with queues to schedule runs. See Queues for more information.
To stage multiple runs at once using flag list values, specify --stage-trials
instead of --stage
. See Batches below for information on generating multiple trials using flag list values.
Restart a Run
Restart a run using --restart RUN-ID
with run
.
Restart a run that terminates early and has a checkpoint to restart from.
Note Your operation code must support restarts from a checkpoint. This requires specific coding to handle restarts. You generally do this by looking for a checkpoint file and restarting from the checkpoint if it’s present. When Guild restarts a run, the run has access to any checkpoints it writes previously.
By default, Guild uses the source code associated with the run when restarting. To force Guild to use the current source code, specify the --force-sourcecode
option along with --restart
. Use this when you want to apply a fix to a terminated run.
Guild uses the run flag values when restarting. You can redefine flag values as needed when restarting a run.
Run From a Prototype
To start a new run using an existing run as a prototype, use --proto RUN-ID
with run
.
By default, Guild uses the source code associated with the prototype run. To force Guild to use the current source code, specify the --force-sourcecode
option along with --proto
.
Show Operation Help
Show help for an operation by specifying the --help-op
option with run
.
Command help is generated from the Guild file.
Test an Operatiion
There are various ways to test an operation. The first way is to simply run it. Consider implementing the 10 Second Rule for your operations so you can run the full code path in less than 10 seconds.
Test Source Code
Show what Guild copies as source code for an operation by specifying the --test-sourecode
option with run
. Guild shows the rules used to include and exclude source code files. It shows the files that are selected and those that are skipped.
Test Output Scalars
Use --test-output-scalars
to test operation output scalar configuration. You can test a file or test standard input using the filename -
.
Test output scalars for a run by piping run output to the run
command:
guild cat --output | guild run --test-output-scalars
You can type input per line for evaluation using this command:
guild run --test-output-scalars -
When you’re done evaluating input, specify a blank line by pressing Enter or type Ctrl-C.
Test Flags
Flag detection, flag imports, and flag definitions are often a process of refinement as you develop an operation. Use the --test-flags
option with run
to show how Guild processes flags for an operation.
Stage a Run in a Directory
Use --stage
with --run-dir
to stage a run in a directory. Guild initializes the run as it would normally. This lets you inspect the run directory and even run the operation directly with Python. This is often an effecitve way to troubleshoot issues. When you’ve resolved any issues, run the operation normally.
Run on a Remote System
To run an operation on a remote, use the --remote
option with guild run
.
You must define the specified remote in user configuration.
See Remotes for information on defining and using remotes.
Manage Runs
Run Filters
Run-related commands support a common set of run filter options. Use these options to limit the command to runs that match the specified filters.
The table below lists the common run filter options.
Option | Description |
---|---|
-l, --label VAL |
Filter runs with labels matching VAL. |
-U, --unlabeled |
Filter only runs without labels. |
-M, --marked |
Filter only marked runs. |
-N, --unmarked |
Filter only unmarked runs. |
-R, --running |
Filter only runs that are still running. |
-C, --completed |
Filter only completed runs. |
-E, --error |
Filter only runs that exited with an error. |
-T, --terminated |
Filter only runs terminated by the user. |
-P, --pending |
Filter only pending runs. |
-G, --staged |
Filter only staged runs. |
-S, --started RANGE |
Filter only runs started within RANGE. See above for valid time ranges. |
-D, --digest VAL |
Filter only runs with a matching source code digest. |
Commands that support these filters include cat
, compare
, export
, import
, label
, ls
, mark
, open
, publish
, pull
, push
, runs diff
, runs info
, runs purge
, runs restore
, runs rm
, runs
, select
, tensorboard
, and view
.
List Runs
List runs using guild runs
. This is an alias for the full command guild runs list
.
guild runs
By default Guild shows the latest 20 runs. You can show more runs using one more occurrances of the -m
option. Each time you specify -m
Guild shows 20 more runs.
Show all runs using the -a
option.
You can filter runs using various filter options. See runs list
for details.
You can list runs in an export archive by specifying the directory with the -A, --archive
option.
Get Run Information
Use guild runs info
to show run information. This command outputs a number of run attributes.
By default Guild shows information for the latest run. Specify a run index or run ID to show information for a different run.
To export runs info as JSON, use the --json
option.
To list run files, use guild ls
.
To open a file from a local run using an associate program, use guild open
.
To open a new command line shell for a run, use --shell
with guild open
. The shell is configured with the run directory as the current directory.
To show the contents of a run file, use guild cat
.
Compare Runs
Guild support various methods for comparing runs.
The table below describes how various commands are used to compare.
Guild Command | When to Use |
---|---|
compare |
Compare run flags and scalars in a tabular view. Command is terminal based works the same when run remotely and locally. See Guild Compare for details. |
view |
Graphical application (web based) for exploring and comparing runs. See Guild View for details. |
tensorboard |
Use TensorBoard to compare run scalars, images, hyperparameters, and other logged summaries. See TensorBoard for details. |
runs diff |
Use a diff program to compare runs. |
Delete Runs
Use guild runs rm
or guild runs delete
to delete runs. These commands are identical. Use the form you prefer.
By default, when you delete a run, Guild moves the run to a trash directory in case you want to restore it. Use guild runs restore
to restore deleted runs. List deleted runs using guild runs -d
.
You can permanently delete deleted runs using guild runs purge
. This operation cannot be undone. Purging runs frees disk space.
Tip Avoid the temptation to permanently delete runs until you need to. Deleted runs are surprisingly useful sources of information. When they’re permanently deleted, the information is gone.
Use
guild check --space
to see space used by runs. When purging runs to free disk space, consider deleting old runs using the-S
option (short form of--started
). For example, purge runs older than 30 days withguild runs purge -S "before 30 days ago"
.
Export and Import Runs
Use guild export
to move or copy runs to a local directory.
Use --move
with export
to move runs out of the environment into a directory. This is a good method for organizing runs over time. Use different export directories to organize and archive runs. This keeps your environment free of older runs that you aren’t working with. You can import these runs back into the environment as needed.
Use guild import
to move or copy runs from a directory to your environment.
Copy Runs to and from a Remote System
Use guild push
to copy runs from your local environment to a remote. This is useful for backing up runs to a server or to a cloud service like S3. You can also use this method to share runs with colleagues or deploy models.
Use guild pull
to copy runs from a remote to your local environment. Use this method to restore archived runs or get runs from colleagues.
Labels
A label is used to describe a run. Run labels are shown in various contexts including runs lists and run comparisons.
To show the label for a specific run, use guild runs info
— the label is designated by the label
attribute.
Use labels to filter runs. See Run Filters above for commands that support the label filter. For example, to show runs with labels containing “best”, run:
guild runs -l best
Set a Label for a New Run
By default, Guild generates a label using the label
template configured for an operation. If an operation doesn’t specify a label template, Guild generates a default label with flag assignments.
Set a different label using the --label
option with guild run
.
You can alternatively tag a run with a string that’s prepended to the default label. Specify the tag with the --tag
option.
Modify a Run Label
Modify the label for one or more runs using guild label
. When modifying a label, you have several options.
The table below lists ways of modifying labels.
Action | Command |
---|---|
Replace labels | guild label --set LABEL |
Prepend a value to labels | guild label --tag VALUE (alias for guild label --prepend VALUE ) |
Append a value to labels | guild label --append VALUE |
Remove a value from labels | guild label --untag VALUE (alias for guild label --remove VALUE ) |
Clear labels | guild label --clear |
By default Guild applies the label
command to the latest run. To apply it to different runs, use one or more RUN
arguments. To label all runs matching a filter, use the :
arugment.
Batches
A batch run is a run that generates other runs. A run generated by a batch is referred to as a trial.
The following command starts a batch. It uses the default batch operation to generate two trials — one for each flag value specified:
guild run train lr='[0.01,0.1]'
The following command also starts a batch. It generates 20 runs using randomly selected values from the log-uniform distribution over the specified range:
guild run train lr=loguniform[1e-5:1e-1]
Both examples start batches implicitly based on the specified flag values. Start a batch explicitly by specifying either --optimize
or --optimizer
. Optimizers are batch operations. They generate one or more trials.
For more information, see Hyperparameter Optimization.
Batch Files
Trials are also generated when you specify a batch file for a run. Batch files are specified using @PATH
where PATH
is the path to a file containing trial flags.
Batch files must be defined using one of the following formats:
- CSV
- YAML
- JSON
Sample CSV formatted batch file:
lr,batch_size,dropout
0.1,100,0.1
0.1,100,0.2
0.01,100,0.1
0.01,100,0.2
Sample YAML formatted batch file:
- lr: 0.1
batch_size: 100
dropout: 0.1
- lr: 0.1
batch_size: 100
dropout: 0.2
- lr: 0.01
batch_size: 100
dropout: 0.1
- lr: 0.01
batch_size: 100
dropout: 0.2
Sample JSON formatted batch file:
[{"lr": 0.1, "batch_size": 100, "dropout": 0.1},
{"lr": 0.01, "batch_size": 100, "dropout": 0.2},
{"lr": 0.1, "batch_size": 100, "dropout": 0.1},
{"lr": 0.01, "batch_size": 100, "dropout": 0.2}]
You can override flag values when using batch files.