Exposing many hyperparams to guild for image augmentation

We use imgaug for doing computer vision augmentation. One of our imgaug transformations looks something like this:

iaa.Sequential([
    iaa.Sometimes(
        0.5,
        iaa.GaussianBlur(sigma=(0, 0.5))
    ),
    iaa.AdditiveGaussianNoise(loc=0, scale=(0.0, 0.05 * 255), per_channel=0.5),
    iaa.MultiplyAndAddToBrightness(mul=(0.9, 1.), add=(-5, 5)),
    # CoarseDrooput adds random black squares throughout the patch.
    iaa.CoarseDropout(0.02, size_percent=0.25)
], random_order=True)

There are a lot of hyperparams here that ideally could be exposed to guild. However, doing that through argparse and the CLI is too much work.

There is a feature request up on imgaug to specify such a pipeline through a yaml file, but I was wondering if anyone had another idea on how to go about this.

Guild supports flag-driven configuration files, which you could use this in case case. However, it’s current implementation doesn’t support flag imports from config files, so you’d have to define the flags in the Guild file, which is just as painful as defining them in argparse.

I’m seeing around 15 possible hyper params in the snippet there. That’s arguably a lot for argparse/CLI but not crazy. 150 hyperparams — that’d be pushing my limits :slight_smile:

The problem, as I see it, is that you need to define the hyper params somewhere. Guild will need to know what values to drive.

If 15 is your limit, I’d just use argparse. If you’re dealing with 50+ hyper params, that’s a bigger problem. That said, 50+ hyper params presents a search space that’s so large I wonder how much luck you’ll have covering it in serveral lifetimes!

That said, I’ve seen projects that have well more than 50 flags. Not all of them need to be tuned. The question is, how to support them with some degree of elegance. This is a non-Guild specific problem as it exists even when running from Python directly.

Setting aside Guild for a moment, here’s how I’d frame the problem:

Any complex project has the option of moving its config outside Python code and into something else. Moving config outside Python makes it easier to dry different flag values for operations, but it also makes it easier to expand architectures as components can be quickly assembled just by config. When this happens, the complexity shift from Python to the config format. Candidate file formats include YAML, JSON, and the plain text variant of protobuf, which is common in Google projects.

New config is presented to an operation as different config files. This is great in theory, but when the config files get complicated, it’s nearly as hard to drive experiments. The problem just shifts over to another file format. For example, how do you change a layer count when that count is buried inside a complicated JSON or YAML file?

Many many projects face this problem and it’s independent of Guild.

Guild’s problem is to infer the set of flags from this soup of configuration. Currently Guild can infer config from argparse and Click based CLIs and Python globals. It currently cannot infer anything from a config file. This is a planned enhancement that should land before or around 0.8.

So where does that leave us today? I would be inclined to write a simple argparse generator. The generate would read from a config file and infer a list of flags that can be modified from a CLI. Something like this:

# train.py

import argparse
import os

import yaml

if os.path.exists("config.yml"):
    # If config.yml is provided, always use that.
    config = yaml.load(open("config.yml"))
elif os.path.exists("config.yml.in"):
    # If config.yml.in is provided, use it as defaults with CLI
    # overrides.
    config = yaml.load(open("config.yml.in"))
    assert isinstance(config, dict), config
    p = argparse.ArgumentParser()
    for name, default in sorted(config.items()):
        p.add_argument("--%s" % name, default=default, type=type(default))
    args = p.parse_args()
    config.update(dict(args._get_kwargs()))
else:
    assert False, "missing config: expected config.yml or config.yml.in"

print("Using config: %s" % config)    

Here’s a sample config.yml.in:

# config.yml.in

s: hello
i: 123
f: 1.123
b: yes

And a Guild file you can use to test it with:

# guild.yml

train:
  flags-dest: args
  flags-import: yes
  requires:
    - file: config.yml.in

To test, run something like this:

guild run train f=.2.345 s=hola

Guild will load train.py, which uses config.yml.in to dynamically generate an argparse parser. Guild uses these defaults and types for flags via its flags import support. With the CLI you can easily override flags values that are otherwise defaults in config.yml.in.

With this approach you can maintain your flags list in config.yml.in without messing with argparse. That’s dynamically generated using the code snippet.

I’ll admit, it’d be nice if Guild did this all for you. I expect this feature will land at some point not too far off. But in the meantime, it’s not a terrible amount of code. It’s also 100% independent of Guild. I’m always a little nervous about wiring a dependency on Guild in any way. I feel that colleagues should be free to run experiments without a specific experiment tool dictating how they should run things.

In other words, this works just as well:

python train.py --f 2.345 --s hola

If this little scheme is something you like, you can put the config/argparse init code into a module and use it like this:

# train.py

import config_util

config = config_util.init()
print(config)

The support module:

# config_util.py

import argparse
import os

import yaml

def init():
    if os.path.exists("config.yml"):
        return yaml.load(open("config.yml"))
    elif os.path.exists("config.yml.in"):
        config = yaml.load(open("config.yml.in"))
        assert isinstance(config, dict), config
        p = argparse.ArgumentParser()
        for name, default in sorted(config.items()):
            p.add_argument("--%s" % name, default=default, type=type(default))
        args = p.parse_args()
        config.update(dict(args._get_kwargs()))
        return config
    else:
        assert False, "missing config: expected config.yml or config.yml.in"

This can even be moved into a function for a bit cleaner interface.

# train.py

import config_util

def main():
    config = config_util.init()
    prepare(config)
    train(config)
    test(config)

if __name__ == "__main__":
    main()
1 Like

I found this library, parse_it, which seems to do a good job of getting all params from a config file, but allows any of them to be overriden by command-line args (with a default syntax of --param_name value) – so, importantly, we don’t need to specify 100s of argparse lines for these params.
I wonder if guildai can integrate with parse_it somehow?

This is on the near term roadmap — not supporting this library specifically (it’s not too hard to parse various config files, we already do this) but to support flag imports from config files.

You can do this currently:

op:
  requires:
    - config: my-flags.yml

Guild will look for my-flags.yml and use those values, updated with any user provided flag values, and re-save that file to the run directory. Your script then can access the config file as it normally would but with the current flag values for the run.

This is all fine and good but Guild does not import flags from these files the way it does with argparse/Click and globals. So in practice it’s a pain because you need to explicitly define the flags in the Guild file. Obviously your case of 100s of flags this is not so helpful!

The upcoming feature would shift the spelling of the above to something like this:

op:
  flags-import: all
  flags-dest: config:my-flags.yml

This makes config files first class sources for flag values, which is what we want here I think.

The underlying implementation to parse, merge, and rewrite the files is already basically there so we’d probably just reuse that rather than introduce a dependency to another library. That said, there’s no rule against that so we could take a look.