Data Filepath Flag

miguel · October 15, 2020, 8:50pm

Hello, I’m converting a project to guild and I have a question about setting a filepath as a flag.

At the top of my training script I have a variable data_fp = “…/…/data/dtype2/processed_data.npy”
before this was a guild project I was just cutting and pasting different filepaths when I want to train the model on new data but now I want to be able to set it as a flag.

When I try to run this with guild I get an error because it doesn’t see that path. Is there a way to do set this up so that the script can see the data from the /run directory?

Also, currently my folder structure is like this and my guild file is :

proj/
  data/
    dtype1/
      raw_data.npy
    dtype2/
      processed_data.npy
  scripts/
    guild.yml
    data_processing/
      process_data.py
    model/
      train.py
      ...

garrett · October 15, 2020, 10:26pm

The easiest way is to make the data directory available to the run using a dependency.

# guild.yml

test:
  requires:
    - file: data

Then define a flag that uses the relative path to the file. E.g.

# test.py

data = "data/bar.txt"
print(open(data).read())

A working example of this is here:

There are a couple other approaches that occur to me but lets start with this one as it’s the most straight forward. If you run into issues or have questions, just ask here and we can work through them.

miguel · October 16, 2020, 12:26am

I just changed the guild file to look like this but it isn’t working yet. This is what the guild file looks like now.

train:
  description: Train a flow model based on a data file and optionally save the parameters
  main: scripts/flow_model/optim_flow_model
  flags-dest: globals
  flags-import: all
  requires:
    - file: data/saved_npy

garrett · October 16, 2020, 12:34am

That will create saved_py in the run directory. You can confirm this by running:

guild ls

That’s a good way to see what Guild creates in the run directory. Your script runs in that location, so if it can’t find something, it’s either because it’s not there or your script expects it in another location.

You have a few options:

Just link to the data directory (omit saved_npy) — this will make the entire data tree available. Currently Guild symlinks to the directory so you’re not copying any files there.

train:
  requires:
    - file: data

Specify a target-path of data so that the saved_npy directory is accessible as data/saved_npy.

train:
  requires:
    - file data/saved_npy
      target-path: data

Modify your script to look in saved_npy.

If you’re running into another issue, what is the error message you’re getting?

miguel · October 19, 2020, 12:54pm

I went with option two and it worked, thanks for your help!

Topic		Replies	Views
Using flag values as a resource definition General	0	291	September 17, 2022
Dependecies Problem Troubleshooting	6	928	January 22, 2021
Get Started: Create a Guild File Get Started	0	5286	June 7, 2020
Get run ID before/during the corresponding run Troubleshooting	2	430	March 25, 2022
Flags are not being tracked/captured during a guild run inside a docker container Troubleshooting	2	525	June 17, 2021

Data Filepath Flag

Related topics