Guild File Cheatsheet

Guild File Format

Operation only format (use to start and simple projects):

train:
  main: mnist_train

test:
  main: mnist_test

Full format (use for multiple models, named resources, config reuse):

- model: mnist
  operations:
    train:
      main: mnist_train

    test:
      main: mnist_test

Operations

Python Based Operations

One operation, runs Python module, imports all detected flags:

train:
  main: mnist_train
  flags-import: all

Module train located in package mnist:

train:
  main: mnist.train
  flags-import: all

Module train located in subdirectory src:

train:
  main: src/train
  flags-import: all

Module with arguments (flag args are appended to argument list):

train:
  main: main --train
  flags-import: [lr, dropout]

Control Python command using exec:

train-debug:
  exec: python -v -m mnist_train

Other Languages

Run an R script:

train:
  exec: Rscript train.r

R script with arguments:

train:
  exec: Rscript train.r --learning-rate ${learning-rate}
  flags:
    learning-rate: 0.1

Pass all flag arguments:

train:
  exec: Rscript train.r ${flag_args}
  flags:
    learning-rate: 0.1
    batch-size: 100

Flags

Flag Imports

Import all detected flags:

train:
  flags-import: all

Import some flags:

train:
  flags-import:
    - learning_rate
    - batch_size
    - dropout

Import all but some flags:

train:
  flags-import: all
  flags-import-skip:
    - num_classes
    - log_dir

Disable flag import:

train:
  flags-import: off

Flags Interface - Python Modules

Auto-detect flags interface:

train:
  flags-import: all

Disable auto-detect - always use global variables:

train:
  flags-dest: globals
  flags-import: all

Disable auto-detect - use command line arguments:

train:
  flags-dest: args
  flags-import: all

Save flags to params global dict:

train:
  flags-dest: global:param
  flags-import: all

Flags Interface - Other Languages

Pass all args as command line arguments:

train:
  exec: echo ${flag_args}
  flags:
    msg: hello

Pass individual flags as arguments:

train:
  exec: echo ${msg}
  flags:
    msg: hello

Flag Definitions

Single-value flags definitions (values are defaults):

train:
  flags:
    learning-rate: 0.1
    batch-size: 100

Provide flag help using description:

train:
  flags:
    learning-rate:
      description: Learning rate used for training
      default: 0.1
    batch-size:
      description: Batch size used for training
      default: 100

Require a flag value:

train:
  flags:
    data:
      description: Location of data file
      required: yes

Limit values to a set of choices:

train:
  flags:
    optimizer:
      choices:
       - adam
       - sgd
       - rmsprop
      default: sgd

Provide help for choices with description:

train:
  flags:
    optimizer:
      choices:
       - value: adam
         description: Adam optimizer
       - value: sgd
         description: Stochastic gradient descent optimizer
       - value: rmsprop
         description: RMSProp optimizer
      default: sgd

Require a choice:

train:
  flags:
    dropout:
      choices: [0.1, 0.2, 0.3]
      required: yes

Allow other values (choices used in help text):

train:
  flags:
    dropout: [0.1, 0.2, 0.3]
    allow-other: yes

Use type to check values:

train:
  flags:
    data:
      type: existing-path
      default: data
    data-digest:
      type: string
      required: yes
    learning-rate:
      type: float
      default: 0.1

Use an alternative argument name (applies to command line option name and global variable name):

train:
  flags:
    learning-rate:
      arg-name: lr

Use arg name to set nested values with global:NAME interface:

train:
  flags-dest: global:params
  flags:
    learning-rate:
      arg-name: train.lr

Dependencies

Inline vs Named Resources

Inline resource:

train:
  requires:
    - file: data.csv

Named resource (requires full format Guild file):

- operations:
    train:
      requires: data

    test:
      requires: data

  resources:
    data:
      - file: data.csv

Required Project Files

Basic project file dependency (creates a link in run dir to project file):

train:
  requires:
    - file: data.csv

Copy file, rather than link:

train:
  requires:
    - file: data.csv
      target-type: copy

Create resolved dependency under data dir:

train:
  requires:
    - file: data.csv
      target-path: data

Rename target file:

train:
  requires:
    - file: train.csv
      rename: train.csv data.csv

Ensure file contents:

train:
  requires:
    - file: data.csv
      target-type: copy
      sha256: 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03

Require unpacked archive contents (archive types are unpacked by default):

train:
  requires:
    - file: data.tar.gz

Don’t unpack archive:

train:
  requires:
    - file: data.tar.gz
      unpack: no

Require project subdirectory:

train:
  requires:
    - file: datasets/mnist

Required Network Files

Yann LeCun’s MNIST dataset:

train:
  requires:
    - url: http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
      sha256: 440fcabf73cc546fa21475e81ea370265605f56be210a4024d2ca8f203523609
    - url: http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
      sha256: 3552534a0a558bbed6aed32b30c495cca23d567ec52cac8be1a0730e8010255c
    - url: http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
      sha256: 8d422c7b0a1c1c79245a5bcf07fe86e33eeafee792b84584aec276f5a2dbc4e6
    - url: http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
      sha256: f7ae60f92e00ec6debd23a6088c31dbd2371eca3ffa0defaefb259924204aec6

Use a named resource to create dataset files in run subdirectory:

- operations:
    cnn:
      requires: mnist-data
    lr:
      requires: mnist-data

  resources:
    mnist-data:
      target-path: mnist-idx-data
      sources:
        - url: http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
          sha256: 440fcabf73cc546fa21475e81ea370265605f56be210a4024d2ca8f203523609
        - url: http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
          sha256: 3552534a0a558bbed6aed32b30c495cca23d567ec52cac8be1a0730e8010255c
        - url: http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
          sha256: 8d422c7b0a1c1c79245a5bcf07fe86e33eeafee792b84584aec276f5a2dbc4e6
        - url: http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
          sha256: f7ae60f92e00ec6debd23a6088c31dbd2371eca3ffa0defaefb259924204aec6

Required Operation Files

Require all files generated by prepare-data operation:

train:
  requires:
    - operation: prepare-data

prepare-data: {}

Require only *.hdf5 files generated by prepare-data (select uses regular expression):

train:
  requires:
    - operation: prepare-data
      select: .+\.hdf5

Require model.ckpt from any operation that starts with train-:

train:
  requires:
    - operation: ^train-
      select: model\.ckpt

Output Scalars

Default capture pattern (shown here for reference):

train:
  output-scalars: '(\key): (\value)'

Use a different pattern to capture scalars:

train:
  output-scalars: ' - (\key): (\value)'

Disable output scalars:

train:
  output-scalars: no

Use keys to define patterns for each supported scalar (step is a special scalar key used to track global step for scalars):

train:
  output-scalars:
    step: 'Training epoch (\step)'
    loss: 'Validation loss: (\value)'

Use named capture groups to specify the key for a particular pattern:

train:
  output-scalars: 'epoch (?P<step>\step) - train loss (?P<loss>\value) - val loss (?P<val_loss>\value)'

Use a list of specs as needed to define output scalars.

train:
  output-scalars:
    - 'Epoch (?P<step>\step)'
    - loss: 'loss: (\value)'
      acc: 'accuracy: (\value)'
    - '(\key)=(\value)'

Source Code

Copy only Python and YAML files - operation level config:

train:
  sourcecode:
    - '*.py'
    - '*.yml'

Copy only Python and YAML files - model level config (applies by default to all model operations):

- model: cnn
  sourcecode:
    - '*.py'
    - '*.yml'

Extend default rules to include additional files matching a pattern:

train:
  sourcecode:
    - include: '*.png'

Extend default rules to exclude files matching a pattern:

train:
  sourcecode:
    - exclude: '*.csv'

Exclude a directory (improves performance for directories containing many files):

train:
  sourcecode:
    - exclude:
        dir: data

Use a different source code root:

train:
  sourcecode:
    root: ../src

Use a different source code root with modified rules:

train:
  sourcecode:
    root: ../src
    select:
      - '*.py'
      - '*.yml'

Copy the source code to the run directory root:

train:
  sourcecode:
    dest: .

Copy the source code to a src subdirectory in the run directory:

train:
  sourcecode:
    dest: src
    select:
      - '*.py'
      - '*.yml'

Disable source code:

train:
  sourcecode: no

Optimizers

Define optimizers for an operation (use to change default settings):

train:
  optimizers:
    gp:
      kappa: 1.5
      xi: 0.1
    forest:
      kappa: 1.5
      xi: 0.1

Define default optimizer (applies when --optimize used with run command):

train:
  optimizers:
    forest:
      default: yes

Define alternative optimizers that use the same algorithm:

train:
  optimizers:
    gp-1:
      algorithm: gp
      kappa: 1.5
    gp-2:
      algorithm: gp
      kappa: 1.8