Summary
This proposal seeks to simplify the process of specifying file patterns. It addresses the confusion and complexity associated with regular expressions in file patterns used for resource source select
specs. We suggest that Guild use glob patterns by default for select
with the option of using select-regex
as an alternative attribute for specifying regular expression patterns.
This proposal is under development.
Problem
Guild’s select
specification for resource sources uses Python regular expressions. For example, the following configuration is used to select files ending with .txt
:
op:
requires:
- file: .
select: .+\.txt
We find this syntax burdensome for such a simple goal. The syntax is also different from that used for sourcecode
(operation and model attribute) and data-files
(package attribute) to match files.
Proposed Approach
We propose a breaking change to re-interpret the select
patterns used for resource sources as glob patterns rather than as regular expressions. To support regular expressions, which provide considerably more flexibility, we propose a new attribute select-regex
, which may be specified as a mutually exclusive alternative to select
.
Under this approach, the example above is changed to:
op:
requires:
- file: .
select: '*.txt'
Note that this value is single quoted due to YAML’s syntax requirements.
Migrating users
As this is a breaking change, we need a migration strategy that does gives users an easy path to migrate their configuration that does not disrupt their work.
We propose a deprecation period that supports current projects but warns users of an upcoming, breaking change.
During the deprecation period, Guild attempts to detect a regular expression and uses the value as such while logging a warning message. The warning message should instruct the user to rename the attribute to select-regex
to continue using the pattern without warning.
[WARNING]: resource source 'file' appears to be using a regular
expression in 'select'. Support for regular expressions using 'select' is
deprecate. Use 'select-regex' instead. In Guild 0.8, this value will be
used as a glob pattern.
Alternative Approaches
Specify a regex using new syntax
Both glob and regular expression syntax could be supported for a single select
attribute.
For example, Python designates regular expression values using r'...'
. JavaScript supports them as /../
. The JavaScript notation would certainly not be suitable for specifying paths.
Various notations are explored below.
This approach has the advantage of establishing a common syntax for file select expressions that can be used for other settings including those specified as command line options.
JavaScript syntax
select: /^/foo/bar/.+\.txt$/
This is less-than-ideal for defining values with paths. For example, /foo/
would be interpreted as a regular expression of the value foo
, which is quite different from what it looks like.
Python psuedo-syntax
select: r'^/foo/bar/.+\.txt$'
This is slight-of-hand. This looks like a novel string-ish type but In YAML it’s "r'^/foo/bar/.+\.txt$'"
.
This syntax falls over when used in a shell:
$ echo r'hello'
rhello
Explicit prefix
select: regex:^/foo/bar/.+\.txt$
While this syntax requires a lengthy prefix, it is clearly denoted.
Auto-detect glob vs regex
Guild could attempt to detect a glob expression and use the corresponding regular express automatically.
This approach should be rejected because it introduces implicit behavior that is hard to debug. There are no tools that we are aware of that use this approach.
New select-glob
attribute
Rather than introduce a breaking change, add a new attribute select-glob
, which is used with glob expressions. In this case, the example above is changed to:
op:
requires:
- file: .
select-glob: '*.txt'
This is a viable approach but it suffers from two problems:
-
We believe that the majority of cases, glob patterns sufficient for selection. The default should correspond to the majority case.
-
The term
glob
is systems jargon, likeregex
. The more technical case should be the exception and not the default. -
Until Guild reaches 1.0, we are not constrained to non-breaking changes.
The advantage of this approach is that it maintains compatibility with existing projects and avoids the need to support a deprecation period.