Summary operations


This is a proposal for a feature that lets users run operations that use one or more Guild runs as inputs. Consider the case where a user wants to analyze a set of runs to calculate average performance and to select the best performing model. The summary operation needs to know what runs to analyze and have easy access to those runs to perform its work.

This proposal is awaiting feedback


It’s often useful to perform analysis on Guild-generated runs. There are a number of common use cases:

  • Generate a report on the models generated from a particular data set
  • Select a production candidate from many possible models
  • Generate a new model from a set of models or a set of datasets, which are represented by one or more Guild runs

It’s possible for users to manually scan runs using either guild.ipy or the yet-to-be-released API guild._api. However, this requires tedious and potentially error-prone programming.

Guild should make this process as easy as possible.

Proposed Approach

Guild should introduce a new operation type: “summary operation”. A summary operation is a standard Guild operation that requires a set of runs. This requirement is expressed as a Guild dependency.

The operation dependency type should be extended to support multiple runs by way of a multi-run source attribute.

op: guild.pass

    - operation: op
      multi-run: yes

When an operation dependency is multi-run, Guild resolves the dependency by selecting and linking to each matching run. Links are created in the summary operation run directory by default, or under target-path as specified in the dependency source. Links are named using the full run ID and link to the corresponding directory under $GUILD_HOME/runs.


For multi-run dependencies, Guild generates a guild-runs.json file in the same directory as the linked runs. This file contains likely-useful metatadata for each linked run.

// guild-runs.json - located alongside the linked runs in summary op run dir
  { "id": "xxxyyy",
    "dir": "./xxxyyy",
    "status": "completed",
    "flags": {...},
    "scalars": {...},
  }, ...

Run selection

As a part of this proposal, the operation dependency type will be extended to support a select attribute. select is a query-like expression Guild uses to resolve the required runs. This is an extension of the operation attribute value currently used, which only supports operation name selection.

The select attribute can be used to test a run using criteria for run attributes, flag values, and scalars. The select specification will support boolean expressions.

    - operation: op
      multi-run: yes
      select: label contains 'red' and completed

NOTE: The select feature will also be made available in the guild select command.

Command line selection

A user may specify a select spec for a multi-run dependency by prefixing the dependency name with where in a flag-like assignment:

guild run summary op="where label contains 'red' and complete"

Run IDs may be specified using comma or space-delimited lists of full or partial run IDs.

guild run summary op="abcd1234 defa5678 bcde9012"

Summary op preview

Guild will fully resolve the runs to link before starting the operation and show the selected runs in a preview.

You are about to run summary
  The following runs are selected:
    [63d8c402]  op  2022-05-10 09:51:19  completed  
    [52a07a44]  op  2022-05-10 09:51:18  completed  
    [c8d00fb7]  op  2022-05-10 09:51:17  completed  
    [65895e44]  op  2022-05-10 09:51:16  completed  
    [ca4f560e]  op  2022-05-10 09:51:15  completed  
    [b0567025]  op  2022-05-10 09:50:49  completed  
Continue? (Y/n)

Alternative Approaches

Deprecate operation in favor of run and multi-run dependencies

The current operation dependency is arguably misspelled. Strictly speaking, a downstream operation requires a run. We might consider renaming this dependency type accordingly.

upstream: guild.pass

    - run: upstream

The run attribute here would be the select expression. In this case, the spec is shorthand for:

    - run: op = upstream

In the case of multi-run, the confirmation would be:

    - multi-run: upstream

This distinction at the top-level is defensible, considering the differences in the way run and multi-run sources are resolved. For a run, a single run is selected and its contents — i.e. the files inside the run directory — are resolved within the downstream run directory. For multi-run, the top-level run directories themselves are resolved by links within the downstream (summary op) run directory. This difference is arguably better highlighted by making run and multi-run dependencies distinct at the top-level (as opposed to implicit when an attribute is set to true).

The operation dependency type would be forever supported but officially deprecated in favor of run and multi-run.

This spelling has some advantages over the above proposal:

  • Clarifies the operation dependency as a single run dependency
  • Makes the distinction between run and multi-run clearer
  • Removes the need for a separate select attributes

Cons to this approach:

  • Updated documentation
  • Standard cost of maintaining a deprecated setting (docs, implementation, and tests)