Projects and Guild Home

Summary

This proposal outlines “project aware” Guild home default locations.

A project is a directory that Guild infers as a top-level container for user code. Projects generally correspond to Git repositories, though not necessarily.

Users work with projects to maintain source code for various applications — e.g. deployable software, model development, data management, etc. By convention, Guild runs are commonly tied to projects.

Guild should store project related runs within the project directory by default, rather than under the user’s home directory (i.e. ~/.guild).

[This is a breaking change.]

This proposal is under development

Problem

Guild runs are commonly generated for project work. For example, a team of engineers and data scientists may develop a model, which requires source code, model configuration, prepared data, and tooling to generate trained models and compare outcomes. This work is performed in the context of a project — i.e. a top-level directory that contains the source code used for the work.

Guild stores runs under Guild Home, which is ~/.guild by default. To accommodate project workflows in Python, Guild changes the default for Guild Home when a virtual environment is activated. For example, if a virtual environment is located in ~/my-project/.venv, the default Guild Home is~/my-project/.venv/.guildwhen that environment is activated. This is the same for both traditional Python virtual environments (those created usingvirtualenvor thevenvstandard Python module) and those created usingconda`.

The spirit of this modified behavior — using the location of activated virtual environments to store runs — is motivated by a desire to consolidate runs for a particular project, as separate virtual environments are commonly used per project in Python workflows.

This scheme presents a few issues:

  • Virtual environments are not strictly aligned with project work. A user may opt to not use a virtual environment when running project code or a user might use multiple virtual environments. Guild would store runs in different locations under these different circumstances (no activated environment and each of the applicable environments, depending on what’s used). This forces users to set GUILD_HOME as a workaround to consolidate runs to a single project location.

  • Virtual environments are VM constructs and do not typically store ‘var’ style – i.e. routinely updated data – data. A user or engineer who’s unaware of Guild’s default behavior might unintentionally delete runs when cleaning an environment.

  • Languages that don’t use virtual environments (e.g. R) can’t signal to Guild that runs should be located in a project.

Proposed Approach

This proposal entails two changes:

  • A file layout heuristic for inferring a user project
  • A heuristic for inferring Guild home using the current directory and file layout, including the existence of a project

Identifying a project

Guild will infer that a directory is a project if it contains a file designating it as such. Project-designating files must match at least one pre-designated project file pattern. A pattern consists of a glob pattern and a file type designation. Glob pattern are applied to the project directory relative path of the file. A file type designation is one of: file, dir, or any.

Below is a working list of such patterns:

Pattern File Type Description
.guild dir Explicit indicator of a Guild home location (repository)
.Rprofile file R session init file
renv/activate.R file R session activation file (TODO: is this needed given .Rprofile?)
pyproject.toml file PEP 518 build dependency file

Backward Compatibility

XXX

Issues to resolve

Incrementally changing projects

Consider a simple directory with the file test.py. Running guild run test.py from that directory, Guild will look for Guild home in the parent directory, up to and including the home directory or root. If it fails to find a Guild home, it will use ~/.guild. Let’s say that’s the case and the first run lands in ~/.guild/runs.

What happens when the user adds pyproject.toml to the directory? I can imagine one of two things. Option 1: Guild sees pyproject.toml

  • Is a Git repo (e.g. directory .git/objects) not sufficient to identify 90% of all projects? The issue of course is that projects are often started without initializing the repo.

Performance

As Guild home is used for nearly every command, its resolution must be efficient. E.g. the implementation cannot rely on Python entry points (e.g. to delegate project file inference to plugins) as the underlying support for this feature is remarkably inefficient (e.g. 200ms just to import pkg_resources).

Alternative Approaches

Do nothing

TODO

Rely exclusively on .guild directory to signify Guild home

TODO - note option of gulid init and any required change to expose this to the user