I have a repository structured as follows -
|--data/data-for-model-1/data.csv |--data/data-for-model-2/data.csv |--src/model-1-train.py |--src/model-2-train.py
my guild file is as follows -
- model: model-1 operations: train: main: 'src/model-1-train' requires: - detector-data resources: detector-data: - file: data/data-for-model-1 target-type: - link - model: model-2 operations: train: main: 'src/model-2-train' requires: - detector-data resources: detector-data: - file: data/data-for-model-2 target-type: - link
model-1-train.py, I am accessing it’s data by reading the path
But guild doesn’t like this as specifying
data/data-for-model-1 in the guild file creates a symlink starting from
data-for-model-1 at the new cwd that guild creates. So from within the python file, you would need to do
read('data-for-model-1/data.csv') when running with guild.
So it seems that I need to modify the code depending on whether or not I am using guild to run the experiment.
I can get around this by just not having the
data directory and putting
data-for-model-2 at the same level as
src and always do
read('data-for-model-1/data.csv') from within the python file.
Or I can simply do
resources: detector-data: - file: data
for both the models. That is fine (and probably what I will do) so long as only a symlink is created and the data is not being copied over.
Is this the only way to ensure that the code can remain unchanged irrespective of whether I am running with guild or not?
PS: Amazing project, love it!