I have a repository structured as follows -
|--data/data-for-model-1/data.csv
|--data/data-for-model-2/data.csv
|--src/model-1-train.py
|--src/model-2-train.py
my guild file is as follows -
- model: model-1
operations:
train:
main: 'src/model-1-train'
requires:
- detector-data
resources:
detector-data:
- file: data/data-for-model-1
target-type:
- link
- model: model-2
operations:
train:
main: 'src/model-2-train'
requires:
- detector-data
resources:
detector-data:
- file: data/data-for-model-2
target-type:
- link
Within model-1-train.py
, I am accessing it’s data by reading the path read('data/data-for-model-1/data.csv')
.
But guild doesn’t like this as specifying data/data-for-model-1
in the guild file creates a symlink starting from data-for-model-1
at the new cwd that guild creates. So from within the python file, you would need to do read('data-for-model-1/data.csv')
when running with guild.
So it seems that I need to modify the code depending on whether or not I am using guild to run the experiment.
I can get around this by just not having the data
directory and putting data-for-model-1
and data-for-model-2
at the same level as src
and always do read('data-for-model-1/data.csv')
from within the python file.
Or I can simply do
resources:
detector-data:
- file: data
for both the models. That is fine (and probably what I will do) so long as only a symlink is created and the data is not being copied over.
Is this the only way to ensure that the code can remain unchanged irrespective of whether I am running with guild or not?
PS: Amazing project, love it!