Guild merge

Summary

This is a proposal for a new guild merge command. The merge command copies files from a run to a project directory. It’s used synchronize a project with a specific run.

This is useful when a project is modified over time to generate multiple runs and one of those runs is considered “most successful” or “selected” (for whatever reason) and should be promoted to the project itself.

This proposal is awaiting feedback

Problem

Guild is useful for generating experiments that track both operation results and the source code associated with the experiment. This lets users casually experiment with their project code without needing to commit their changes to source code or to otherwise synchronize source code with specific experiments.

Over the course of experimenting, a user may determine that a specific run (not necessarily the latest run) contains the source code and (potentially) other run files that should be reflected in the project. Without a feature to support this, the user must manually copy those files back to the project. This is cumbersome and error-prone.

Guild should make this process as simple as “merge this run into my project” — and Guild should do the right thing, providing the appropriate safeguards to ensure that project state is corrupted.

Proposed Approach

Guild will provide a new command guild merge.

guild merge --help
Usage: guild merge [OPTIONS] [RUN]

  Merge run files into a project.

Options:
  <run select options - omitted for brevity>
  -s, --skip-sourcecode    Don't copy run source code.
  -d, --skip-dependencies  Don't copy project-local dependencies.
  -g, --skip-generated     Don't copy run-generated files,
  -x, --exclude PATTERN    Exclude a file or pattern (may be used multiple
                           times).

  -t, --target-dir PATH    Directory to merge run files to (required if
                           project directory cannot be determined for the
                           run).

  -m, --skip-summary       Don't generate a run summary.
  --summary-name NAME      Name used for the run summary. Use '${run_id}' in
                           the name to include the run ID.

  -y, --yes                Don't prompt before copying files.
  --help                   Show this message and exit.

This command identifies a single run using standard run filters and performs the following steps:

  • Copy run source code to project
  • Copy project-local file dependencies present in run to project
  • Copy generated files to project
  • Generate a run summary file in the project

Each of these steps is optional and may be skipped by specifying the applicable --skip-xxx command option.

Safeguards

Guild takes various steps to ensure that the project is not corrupted by this command.

  • Fail if a replaced file is listed as modified (or comparable) by the project VCS (e.g. git)
  • Create a backup of all replaced files that can be later restored
  • Provide an interactive mode that prompts the user for each replacement, displaying a visual difference of the replacement

Mapping run files to project paths

To implement this feature correctly, Guild must know where a run-local file is located in a project directory.

  1. Guild must know where the run source code is located and the source code root in the project.
  2. Guild must know the original project path for each resolved dependency file.

Option 1 may be possible today if Guild saves the project source code root in the run.

Option 2 is not possible today as Guild does not store where resolved project-local files original from in the project.

The information used for this becomes stale if the project structure changes.

Alternative Approaches

Require a VCS for merge

With this alternative, guild merge would fail under these conditions:

  • The project (or directory specified by --target-dir) is not empty AND not under a VCS
  • There are any uncommitted changes

Provided these conditions are met, Guild would copy files from the run to the target directory. Any copied files that are NOT already committed to the VCS must not existing in the target. I.e. Guild will never replace a file that cannot be restored from the VCS.

By default, Guild will leave the changed files uncommitted to the VCS. This lets the user check the changes and commit them explicitly.

Guild MAY provide an option to auto-comment the changes with a required message (or by way of a message editor).

Copy-project-as-is (with optional exclusions) run directory init

This approach takes a more radical approach in that it changes the way Guild initializes a run directory.

Guild currently copies project source code to .guild/sourcecode under the run directory. This leaves the run directory initially free of any files. Guild then resolves dependencies by copying or linking them in the run directory. Dependencies are explicitly defined in the Guild file.

Advantages of the current approach:

  • Source code is separated from required and generated files
  • Files that are not used by the run don’t appear in the run directory

Disadvantages:

  • Users must learn to specify dependencies explicitly — that project files they rely on when they run their code in the project directory are not available in the run directory by default
  • Causes a file mapping problem (see above) — we cannot simply rely on a directory-to-directory comparison

We might introduce this as a package deal:

  1. Use this scheme when running scripts directly
  2. Add support for this copy-project-as-is configurable in the Guild file
  3. Only support merge command for runs generated by this copy-project-as-is method

With point 3, we can assume that a run directory mirrors the project. This makes merging trivial — we simply copy over the directory structure from the run to the project directory (either with backup of the project or requiring a fully committed project under VCS).