Get run ID before/during the corresponding run

korth · March 24, 2022, 4:54pm

Hey, my issue is the following:
During my run I am training several models and save their checkpoint. In the end I need to load them all for the final testing procedure. However, since the models are saved in .guild/runs/ID/my_saved_model.ckpt and I don’t know the current ID, there is no way to re-load the models into my training script. Is there a way to pass the current/future ID into my python script?

garrett · March 25, 2022, 3:04pm

When Guild runs your operation, it runs the process with the run directory as the current working directory. Any relative paths will resolve to the run directory, always. So you can open the file in question using open('my_saved_model.ckpt').

If you are talking about a separate operation for loading and testing — i.e. you have a train operation and a separate test operation, you’ll need to setup dependencies to ensure that your checkpoint files are available to your test operation.

In both cases — i.e. when you train and test in the same operation and when your train and test operations are separate — you use a relative path to get to the files.

garrett · March 25, 2022, 3:23pm

As an addendum, you can get the run ID and the run dir using environment variables:

RUN_ID
RUN_DIR

But these are rarely needed. If you want to access run directory files, use relative paths. This also keeps your code independent from Guild. E.g. if you’re relying on Guild specific env vars, you need to make sure you set these up for each run if you ever want to run your code independently of Guild.

Any change to your code that prevents it from running correctly without Guild should be considered an anti-pattern.

A pattern that I recommend is to use command line options to point to files-of-interest where the default is the cwd:

import argparse

p = argparse.add_argument('--data-files', default='.')
p = argparse.add_argument('--checkpoint-files', default='.')
do_something(p.parse_args())

You can certainly use other defaults, but you’ll need to keep those paths in mind when you define dependencies (i.e. make sure the required files land in the right subdir — it’s easy to do this with target-path attr in the dependency).

You can also specify the data location as a part of the main spec:

train:
  main: train --data-files . --checkpoint-files .

test:
  main: test --checkpoint-files .
  requires:
    - operation: train

This is all maybe too much information (TMI ) but it might be helpful background info to understand how Guild works here and what good design looks like.

Topic		Replies	Views
Data Filepath Flag Troubleshooting	4	549	October 19, 2020
Getting full run id Troubleshooting	5	1031	September 18, 2020
Dependecies Problem Troubleshooting	6	923	January 22, 2021
How to get the path of a guild resource General	2	493	May 7, 2021
Cross referencing run id with a job's arguments General	1	309	October 11, 2022

Get run ID before/during the corresponding run

Related topics