Summary operations


This is a proposal for a feature that lets users run operations that use one or more Guild runs as inputs. Consider the case where a user wants to analyze a set of runs to calculate average performance and to select the best performing model. The summary operation needs to know what runs to analyze and have easy access to those runs to perform its work.

This proposal is awaiting feedback


It’s often useful to perform analysis on Guild-generated runs. There are a number of common use cases:

  • Generate a report on the models generated from a particular data set
  • Select a production candidate from many possible models
  • Generate a new model from a set of models or a set of datasets, which are represented by one or more Guild runs

It’s possible for users to manually scan runs using either guild.ipy or the yet-to-be-released API guild._api. However, this requires tedious and potentially error-prone programming.

Guild should make this process as easy as possible.

Proposed Approach

Guild should introduce a new operation type: “summary operation”. A summary operation is a standard Guild operation that requires a set of runs. This requirement is expressed as a Guild dependency.

The operation dependency type should be extended to support multiple runs by way of a multi-run source attribute.

op: guild.pass

    - operation: op
      multi-run: yes

When an operation dependency is multi-run, Guild resolves the dependency by selecting and linking to each matching run. Links are created in the summary operation run directory by default, or under target-path as specified in the dependency source. Links are named using the full run ID and link to the corresponding directory under $GUILD_HOME/runs.


For multi-run dependencies, Guild generates a guild-runs.json file in the same directory as the linked runs. This file contains likely-useful metatadata for each linked run.

// guild-runs.json - located alongside the linked runs in summary op run dir
  { "id": "xxxyyy",
    "dir": "./xxxyyy",
    "status": "completed",
    "flags": {...},
    "scalars": {...},
  }, ...

Run selection

As a part of this proposal, the operation dependency type will be extended to support a select attribute. select is a query-like expression Guild uses to resolve the required runs. This is an extension of the operation attribute value currently used, which only supports operation name selection.

The select attribute can be used to test a run using criteria for run attributes, flag values, and scalars. The select specification will support boolean expressions.

    - operation: op
      multi-run: yes
      select: label contains 'red' and completed

NOTE: The select feature will also be made available in the guild select command.

Command line selection

A user may specify a select spec for a multi-run dependency by prefixing the dependency name with where in a flag-like assignment:

guild run summary op="where label contains 'red' and complete"

Run IDs may be specified using comma or space-delimited lists of full or partial run IDs.

guild run summary op="abcd1234 defa5678 bcde9012"

Summary op preview

Guild will fully resolve the runs to link before starting the operation and show the selected runs in a preview.

You are about to run summary
  The following runs are selected:
    [63d8c402]  op  2022-05-10 09:51:19  completed  
    [52a07a44]  op  2022-05-10 09:51:18  completed  
    [c8d00fb7]  op  2022-05-10 09:51:17  completed  
    [65895e44]  op  2022-05-10 09:51:16  completed  
    [ca4f560e]  op  2022-05-10 09:51:15  completed  
    [b0567025]  op  2022-05-10 09:50:49  completed  
Continue? (Y/n)

Alternative Approaches

Deprecate operation in favor of run and multi-run dependencies

The current operation dependency is arguably misspelled. Strictly speaking, a downstream operation requires a run. We might consider renaming this dependency type accordingly.

upstream: guild.pass

    - run: upstream

The run attribute here would be the select expression. In this case, the spec is shorthand for:

    - run: op = upstream

In the case of multi-run, the configuration would be:

    - multi-run: upstream

This distinction at the top-level is defensible, considering the differences in the way run and multi-run sources are resolved. For a run, a single run is selected and its contents — i.e. the files inside the run directory — are resolved within the downstream run directory. For multi-run, the top-level run directories themselves are resolved by links within the downstream (summary op) run directory. This difference is arguably better highlighted by making run and multi-run dependencies distinct at the top-level (as opposed to when an attribute is set to true).

The operation dependency type would be forever supported but officially deprecated in favor of run and multi-run.

This spelling has some advantages over the above proposal:

  • The term “run dependency” is more accurate than “operation dependency” — “operation” is even inaccurate in cases where a select spec omits an operation name, should that be supported (it technically could be, e.g. where the select looked for runs containing certain files, independent of the operation name)
  • Clarifies the operation dependency as a single run dependency
  • Makes the distinction between run and multi-run clearer
  • Removes the need for a separate select attributes

Cons to this approach:

  • Cognitive shift/lift needed by users might not be justified by the benefits
  • Updated documentation
  • Standard cost of maintaining a deprecated setting (docs, implementation, and tests)

Hi Garrett,

I am desperately looking forward to seeing this function implemented! I have one operation for generating datasets with different parameters and another for collecting them. I have to manually specify the guild_home/runs directory when running the “collect” operation. Therefore, this command would help me a lot.

The current operation dependency is arguably misspelled. Strictly speaking, a downstream operation requires a run .

I agree with you, and that’s why I would for the second approach with run dependencies.

Note on use case here… summary ops might be used to modify target runs - e.g. to write useful summary info per run. Guild should probably support this formally. Guild should note what files were added/changed/deleted as a result of this summary op. The summary op should be a record of what happened.

The runs should maintain a change history.

This notion should apply comments, label changes, and tag changes as well.

Idea from Julia’s comment on fairness metrics…

Guild could apply something (plugin-defined built-in ops?) to apply higher level analysis to runs/trained models. E.g. is something “fair”? Just run something like:

guild run fairness:check <run ID or other run identifier> --checks aaa,bbb,ccc

Fairness here could be any variety test or analysis. --checks is a list of named fairness checks to perform.

The idea here of applying an operation to one or more runs is something we’ve only (to date) walked up but not implemented).

This proposal needs to address what a summary operation looks like in a pipeline (stepped operation). I think running a summary operation in a pipeline should, by default, apply to runs generated by the pipeline and not include runs outside the pipeline.