Multiple config file flags-dest

Summary

This proposal outlines possible approaches to support multiple configuration files for Guild’s flags interface.

The driver for this proposal is Multiple yml flag files.

This proposal is awaiting feedback.

Problem

A user may need to support multiple configuration files for an operation. Guild currently supports a single file via it’s config type for flags-dest.

For example, to provide a params.json file to an operation, which contains user-specified flag values per operation, a user may define a flags interface like this:

# guild.yml
op:
  flags-dest: config:params.json
  flags-import: all

If the operation needs multiple configuration files, Guild does not support this.

Proposed Approach

Guild should be modified to support multiple parameter files as operation inputs. This should be done by modifying/extending the flags-dest spec to support multiple files.

This proposal outlines several possible spellings of “multiple configuratio files”.

  1. Implicitly support a glob-style syntax in the current config spec
  2. Implicitly support regular expressions in the current config spec
  3. Introduce a new quoting scheme to the current config spec to support regular expressions
  4. Introduce a new flags-dest prefix/type that explicitly configures multiple configure files

For the examples below, we assume that the user wants to specify one or more of the following parameter files for flags:

./flags-a.yml
./flags-b.json
./flags/a.json
./flags/b.json

Option 1 - Implicit glob syntax

This option would add glob support to the existing config processing. This would be potentially backward-incompatible as any configuration files containing glob-syntax tokens would potentially need to be modified.

Flags dest examples:

1.1 - config:flags-*.*
1.2 - config:flags/*.json
1.3 - config:flags/{a,b}.json
1.4 - config:{flags-a.yml,flags-b.yml}

Option 2 - Implicit regex syntax

This is not a viable option but is included for completeness. It is brazenly backward-incompatible and risks accidental pattern matching as users use dots (.) as extension delimiters – e.g. params.json should technically be specified as params\.json.

Flags dest examples:

2.1 - config:flags-.*\..*
2.2 - config:flags/.*\.json
2.3 - config:flags/[a|b]\.json
2.4 - config:[flags-a\.yml|flags-b\yml]

Option 3 - Quoted regex syntax

This is similar to Option 2 in that a regular expression syntax is supported, but makes the use of regex explicit through some quoting scheme.

Flags dest examples:

3.1 - config:/flags-[a|b]\.[json|yml]/
3.2 - config:!regex!flags/.*\.json

Option 4 - New flags-dest type

4.1 - config-regex:flags/.*\.json
4.2 - config-multi:[flags-a\.yml|flags-b\.json]
4.3 - multi-config:flags/[a|b].json
4.4 - configs:flags/.*

Analysis of options

Option 2 is not viable and is rejected outright.

Option 1 introduces this feature seamlessly with minimal chance of disrupting existing users. Odds are low that a user will need to specify a file with a standard glob-style wildcard. To support option 1, we need to extend Python’s glob support with curly bracket matching supported by bash.

The strong negative of this option is that it implicitly changes the behavior of config from supporting a single file to supporting multiple files. Due to the use of glob style syntax, however, this should have minimal-to-no impact on users, either currently or in the future.

This option provides a simple file matching syntax (glob style) but does not have the potential to use more flexible regular expressions.

Perhaps the most damning critique of option 1 is that it fails to make explicit the underlying support for multiple configuration files. Options 3 and 4 make this configuration explicit.

Option 3 preserves existing behavior by introducing an explicit syntax for regular expressions. As there is not standard for this type of quoting in YAML, we proposal two options: JSON style and a !TAG! style that hearkens to YAML’s tag extension syntax. Neither of these syntaxes is particularly aesthetic, but either would work.

Option 4 preserves existing behavior and makes explicit the intent to match multiple files. This is essentially equivalent to option 3 but the pattern indication is moved to the flags-dest prefix rather than the prefixed value.

Glob syntax vs regular expressions

Glob syntax is a more natural way to match filenames. Regular expressions provide more flexibility.

Assuming that multiple configuration files is closer to an edge case than a mainstream case, it may be wiser to opt for the more flexible matching capability rather than the more natural glob style.

Regular expression support is not possible for option 1. Glob syntax is an option, however, for options 3 and 4.

Final proposed approach

Pending feedback on options above.

I’m partial to option 4. Can we add both glob and regex with two different prefixes? like config-glob and config-regex?

Weighing in with my vote here.

I prefer introducing a new multi-config prefix for flags dest.

  • Option 1 hides the significant fact that multiple files are supported
  • Option 3 is still implicit in that using a pattern implies that multiple files are supported — I’d rather see this spelled at the prefix level

So the driving case would be spelled this way:

op:
  flags-dest: muti-config:guild/flags/[a|b]\.yml
  flags-import: all

But can we have both syntaxes supported? If we wanted to go that route, I’d be inclined to support glob-style by default:

op:
  flags-dest: muti-config:guild/flags/{a,b}.yml
  flags-import: all

and then uglify the regex support like this:

op:
  flags-dest: muti-config:!regex!guild/flags/[a|b]\.yml
  flags-import: all

Is there any current workarounds to do this? For several projects, I have 2 or 3 config files that I would like to have tracked. I need to be able to access these parameters from a Jupyter notebook to generate plots that aggregate the results from multiple runs.

I have tried looping through these additional yaml files to print out the variable: key pairs so that they would get logged as run outputs instead of inputs , but since guild doesn’t log string outputs that wasn’t a solution.

Current the only run-generated outputs that we show in our comparisons are numeric.

And Guild only supports a single config file for a flags interface.

Both topics (run-generated string values - ala scalars but non-numeric - and multiple flag interfaces) are good features that we have captured and plan to support in upcoming releases. Unfortunately neither of those is slated for the next release 0.9 so that’s not of much help to you.

You could finagle flags to appear, however. You’d want to load .guild/attrs/flags from your run process. This is a YAML file that you can load using PyYaml this way:

import yaml

flags = yaml.safe_load(open(".guild/attrs/flags"))

Update flags with any of the configuration you want to appear as flags and save the file back.

flags.update(vals_from_other_config_files)  # e.g. a dict loaded from your other files
yaml.dump(flags, open(".guild/attrs/flags", "w"))

When Guild looks that this run, it will see the additional flags you added.

Note this scheme is simply to get the values to show up in a comparison view. You will sill only be able to import and therefore modify flags values for a run from one of your config files.

1 Like