This proposal seeks to address issues related to source code location for a run.
This is NOT a breaking change.
This proposal is awaiting feedback
Guild currently copies project source code to
.guild/sourcecode by default. This presents two problems:
Users are can be surprised that source code does not exist where they expect it to be.
Some Guild language providers cannot reasonably run an operation when source code does not maintain the same directory structure relative to the current directory.
Guild separates source code from other run-related files (inputs and outputs). This is by design, with the intent of keeping the run directory “clean”. It was believed that there’s some value in starting with an empty directory and then requiring explicit configuration of that directory. In practice, this requirement is often onerous and forces Guild to implement various “smart” workarounds to help the user. The result encourages explicit configuration, but at the cost of surprising behavior and complexity.
There is no technical reason for this separation. In fact, the scheme presents technical challenges, as in the case of R language support.
We propose to remove the distinction between “source code location” and the run directory by changing the default location for source code to
. (i.e. the run directory root). We would note in documentation that the
sourcecode.dest operation attribute should be used to support legacy run formats.
We would modify run file list commands to include all files by default and provide options to filter by source code, inputs, and generated files.
Here are some examples of modified
guild ls # shows all files in run dir, which would include source code
guild ls --sourcecode # shows only source code files
guild ls --deps # shows only resolved dependencies
guild ls --generated # shows only generated files
Guild View would similarly be changed to support filtering by source code, dependencies and generated files.
Guild would continue to maintain support for alternative source code destinations and provide thorough test coverage for these. New runs would default to the new scheme. Users who prefer the old scheme could specify
.guild/sourcecode in their Guild file as needed.
The notion of “source code destination” would be considered a legacy topic but the distinction would be maintained in code.
Guild would continue to support
sourcecode filter runs as they exist today. These will be used to differentiate “source code” from other project local dependencies. Files that are not selected through these rules would not be copied to the run directory as a part of the source code initialization stage. They would need to be specified as dependencies using the
requires operation attribute just as they are today.
See Project filtering below for a future enhancement not considered by this proposal.
Guild supports a solution today for languages that prefer (or require) code structure to remain consistent between the project directory and the run directory. The R plugin, for example, would implicitly set the source code destination to the run directory root for R-based operations.
Other language plugins (e.g. the amusing Erlang plugin example) could follow this pattern as needed.
Python operations would otherwise work as they do today.
The advantage of this approach is that it avoids impacting current Guild users. We may under-appreciate the value users ascribe to separating source code from the run directory root.
The disadvantage is that runs behave differently depending on the operation language. We also los the benefits to Python developers of having a straight-forward run directory that closely mirrors the project structure. The bifurcation between source code structure and other run files has long been a source of confusion for some users and it’s not clear there’s a practical payoff.
With this proposal, Guild treats a run as a full or limited copy of the project structure. This is a direct approach to initializing a run:
- Copy project files to the run directory as they appear in the project directory
- Resolve other dependencies, potentially replacing project files based on dependency configuration
This approach impacts two topics: source code specs and project local dependencies.
Guild currently provides a robust but complicated scheme for specifying “source code” to be copied to the run directory. This scheme ranks among the most difficult topics for Guild users.
Guild currently requires non-source files (i.e. files that aren’t already copied using Guild’s implied or arcane source code filtering rules) to be explicitly listed as dependencies using the
requires operation attribute. This is easily the top-ranked point of confusion for Guild users.
This proposal addresses both of these topics and can additionally simplify configuration by treating run init as a “what to ignore from the project” problem. This is similar to Git’s “what to ignore” scheme, implemented by
While this feature is decoupled from this proposal (which is limited to changing the default source code location and updating documentation), we should consider additional features:
- Adopt a “what to ignore” scheme, which can be used to prune a run init by ignoring specified files and directories
.gitignoreas a default heuristic for “what to ignore” regarding source code
dvc.yamlor other mainstream data management specs as a default heuristic for dependencies (i.e. non-source code files required by an operation)
- Add a new dependency type
diras a semantic improvement over
filein cases where a project local dependency is a directory
Guild would use
.gitignore when generating a list of source code files to copy. The operation could provide alternative configuration:
train: sourcecode: ignore: - @gitignore # possible config to use .gitignore rules at this location # additional rules - '*.pyc' - /data