Validating guild operations using captured metrics

copah · May 12, 2021, 1:44pm

Say I have a guild operation guild run classification:train that tracks an Accuracy metric via tensorboard.

I would like to create a test that ensures that this operations achieves a certain Accuracy threshold. I will use this in my CI as a regression test.

I envisioned something like this:

- model: _check
  operations:
    _test_classification:
      steps:
        - run: classification:train
          isolate-runs: no
          expect:
            - output: ${Accuracy} > 0.9

Is there any way to do this with guild now?

garrett · May 12, 2021, 2:08pm

This is not supported. I think there’s a sound argument that it should be. However, there’s a cost. It introduces a generalized computational problem for Guild that includes syntax (presumably a Python expression, which is not too hard to support). A bigger downside is that it starts to move computational logic from the script to the Guild file. I’m inclined to think that’s not a good idea. You can imagine this section becoming quite complex, turning into a Python script in itself!

One tack you might consider is to perform this check in your training operation and log a message that you look for as a fixed string, like ACCURACY OK, etc. You can parameterize the expected accuracy threshold as a flag to the training script, e.g. target-accuracy, e.g. with a default value of 0.9 that can be changed as needed when you train. This value is then memorialized in the run record. As a final stage of the training script, you check accuracy against this flag-defined threshold and print a message.

This approach lets you evolve the definition of “test” to whatever you want with as many value tests as you need. Guild then simply looks for a string from you that says the result is okay.

What do you think?

copah · May 14, 2021, 4:30pm

That would definitely solve it, but I do think it is a bit to invasive.

I was considering doing it with pytest, so you could do something like:

- model: classification
  operations:
    _validate_train:
      - main: python -m pytest tests/validate_classification_train.py
    train:
      - main: ...

- model: _check
  operations:
    _test_classification:
      steps:
        - run:
          - classification:train
          - classification:_validate_train
          isolate-runs: no

Where the tests/validate_classification_train.py would use guild.ipy module to inspect the classification:train run that you are currently testing and then manually assert the Accuracy.

garrett · May 17, 2021, 3:52pm

That’s a sound approach I think. The tests in the validate module could then be quite a bit more robust than what you’d implement in a Guild file.

I do agree that Guild should support simple assertions like what you showed in your example. We’ll get that in.

A couple observations about your snippet (you probably know this but thought I’d point them out).

The main attr for _validate_train should be pytest tests/validate_classification_train.py (you don’t want the python -m in there. If you do want to call Python explicitly, use the exec attr).
Source code will land in .guild/sourcecode so your arg to the pytest module should either be .guild/sourcecode/tests/validate_classification_train.py or you’ll need to ensure that your test scripts are defined as a file dependency.
The isolate-runs attr (last line) needs to be indented under the bullet.

Topic		Replies	Views
Command: check Commands	0	1305	June 10, 2020
Using an alternative to Tensorboard General	6	1068	March 7, 2022
Cross Validation General	3	352	March 23, 2023
Test your Guild file How To	2	1049	November 20, 2020
Operations as dependencies during checks Troubleshooting	1	470	May 7, 2021

Validating guild operations using captured metrics

Related topics