Get run ID before/during the corresponding run

Hey, my issue is the following:
During my run I am training several models and save their checkpoint. In the end I need to load them all for the final testing procedure. However, since the models are saved in .guild/runs/ID/my_saved_model.ckpt and I don’t know the current ID, there is no way to re-load the models into my training script. Is there a way to pass the current/future ID into my python script?

When Guild runs your operation, it runs the process with the run directory as the current working directory. Any relative paths will resolve to the run directory, always. So you can open the file in question using open('my_saved_model.ckpt').

If you are talking about a separate operation for loading and testing — i.e. you have a train operation and a separate test operation, you’ll need to setup dependencies to ensure that your checkpoint files are available to your test operation.

In both cases — i.e. when you train and test in the same operation and when your train and test operations are separate — you use a relative path to get to the files.

As an addendum, you can get the run ID and the run dir using environment variables:

  • RUN_ID

But these are rarely needed. If you want to access run directory files, use relative paths. This also keeps your code independent from Guild. E.g. if you’re relying on Guild specific env vars, you need to make sure you set these up for each run if you ever want to run your code independently of Guild.

Any change to your code that prevents it from running correctly without Guild should be considered an anti-pattern.

A pattern that I recommend is to use command line options to point to files-of-interest where the default is the cwd:

import argparse

p = argparse.add_argument('--data-files', default='.')
p = argparse.add_argument('--checkpoint-files', default='.')

You can certainly use other defaults, but you’ll need to keep those paths in mind when you define dependencies (i.e. make sure the required files land in the right subdir — it’s easy to do this with target-path attr in the dependency).

You can also specify the data location as a part of the main spec:

  main: train --data-files . --checkpoint-files .

  main: test --checkpoint-files .
    - operation: train

This is all maybe too much information (TMI :slight_smile: ) but it might be helpful background info to understand how Guild works here and what good design looks like.