Validating guild operations using captured metrics

Say I have a guild operation guild run classification:train that tracks an Accuracy metric via tensorboard.

I would like to create a test that ensures that this operations achieves a certain Accuracy threshold. I will use this in my CI as a regression test.

I envisioned something like this:

- model: _check
  operations:
    _test_classification:
      steps:
        - run: classification:train
          isolate-runs: no
          expect:
            - output: ${Accuracy} > 0.9

Is there any way to do this with guild now?

This is not supported. I think there’s a sound argument that it should be. However, there’s a cost. It introduces a generalized computational problem for Guild that includes syntax (presumably a Python expression, which is not too hard to support). A bigger downside is that it starts to move computational logic from the script to the Guild file. I’m inclined to think that’s not a good idea. You can imagine this section becoming quite complex, turning into a Python script in itself!

One tack you might consider is to perform this check in your training operation and log a message that you look for as a fixed string, like ACCURACY OK, etc. You can parameterize the expected accuracy threshold as a flag to the training script, e.g. target-accuracy, e.g. with a default value of 0.9 that can be changed as needed when you train. This value is then memorialized in the run record. As a final stage of the training script, you check accuracy against this flag-defined threshold and print a message.

This approach lets you evolve the definition of “test” to whatever you want with as many value tests as you need. Guild then simply looks for a string from you that says the result is okay.

What do you think?

1 Like

That would definitely solve it, but I do think it is a bit to invasive.

I was considering doing it with pytest, so you could do something like:

- model: classification
  operations:
    _validate_train:
      - main: python -m pytest tests/validate_classification_train.py
    train:
      - main: ...

- model: _check
  operations:
    _test_classification:
      steps:
        - run:
          - classification:train
          - classification:_validate_train
          isolate-runs: no

Where the tests/validate_classification_train.py would use guild.ipy module to inspect the classification:train run that you are currently testing and then manually assert the Accuracy.

That’s a sound approach I think. The tests in the validate module could then be quite a bit more robust than what you’d implement in a Guild file.

I do agree that Guild should support simple assertions like what you showed in your example. We’ll get that in.

A couple observations about your snippet (you probably know this but thought I’d point them out).

  • The main attr for _validate_train should be pytest tests/validate_classification_train.py (you don’t want the python -m in there. If you do want to call Python explicitly, use the exec attr).

  • Source code will land in .guild/sourcecode so your arg to the pytest module should either be .guild/sourcecode/tests/validate_classification_train.py or you’ll need to ensure that your test scripts are defined as a file dependency.

  • The isolate-runs attr (last line) needs to be indented under the bullet.

1 Like