This proposal seeks to simplify the process of specifying file patterns. It addresses the confusion and complexity associated with regular expressions in file patterns used for resource source
select specs. We suggest that Guild use glob patterns by default for
select with the option of using
select-regex as an alternative attribute for specifying regular expression patterns.
This proposal is under development.
select specification for resource sources uses Python regular expressions. For example, the following configuration is used to select files ending with
op: requires: - file: . select: .+\.txt
We find this syntax burdensome for such a simple goal. The syntax is also different from that used for
sourcecode (operation and model attribute) and
data-files (package attribute) to match files.
We propose a breaking change to re-interpret the
select patterns used for resource sources as glob patterns rather than as regular expressions. To support regular expressions, which provide considerably more flexibility, we propose a new attribute
select-regex, which may be specified as a mutually exclusive alternative to
Under this approach, the example above is changed to:
op: requires: - file: . select: '*.txt'
Note that this value is single quoted due to YAML’s syntax requirements.
As this is a breaking change, we need a migration strategy that does gives users an easy path to migrate their configuration that does not disrupt their work.
We propose a deprecation period that supports current projects but warns users of an upcoming, breaking change.
During the deprecation period, Guild attempts to detect a regular expression and uses the value as such while logging a warning message. The warning message should instruct the user to rename the attribute to
select-regex to continue using the pattern without warning.
[WARNING]: resource source 'file' appears to be using a regular expression in 'select'. Support for regular expressions using 'select' is deprecate. Use 'select-regex' instead. In Guild 0.8, this value will be used as a glob pattern.
Specify a regex using new syntax
Both glob and regular expression syntax could be supported for a single
For example, Python designates regular expression values using
Various notations are explored below.
This approach has the advantage of establishing a common syntax for file select expressions that can be used for other settings including those specified as command line options.
This is less-than-ideal for defining values with paths. For example,
/foo/ would be interpreted as a regular expression of the value
foo, which is quite different from what it looks like.
This is slight-of-hand. This looks like a novel string-ish type but In YAML it’s
This syntax falls over when used in a shell:
$ echo r'hello' rhello
While this syntax requires a lengthy prefix, it is clearly denoted.
Auto-detect glob vs regex
Guild could attempt to detect a glob expression and use the corresponding regular express automatically.
This approach should be rejected because it introduces implicit behavior that is hard to debug. There are no tools that we are aware of that use this approach.
Rather than introduce a breaking change, add a new attribute
select-glob, which is used with glob expressions. In this case, the example above is changed to:
op: requires: - file: . select-glob: '*.txt'
This is a viable approach but it suffers from two problems:
We believe that the majority of cases, glob patterns sufficient for selection. The default should correspond to the majority case.
globis systems jargon, like
regex. The more technical case should be the exception and not the default.
Until Guild reaches 1.0, we are not constrained to non-breaking changes.
The advantage of this approach is that it maintains compatibility with existing projects and avoids the need to support a deprecation period.