How to avoid copying source code for each run

Dear Garrett and other developers,

I would like to know if there is a way to avoid copying the source code for each guild run I executed.

I need this since I am running tens of thousands of guild runs, each executing the same version of a Python module (located in the root directory) but with different command line arguments. As a result, every guild runs have a copy of all the Python files, and eventually, there are so many files under my Guild Home directory (I am on a shared system with a limit on the number of files per user).

I understand the benefit of having a copy of the source code for each run, preventing the code from being changed after the runs are staged, and keeping a definite copy. But for my situation, I would prefer to have independent Guild Homes for different versions of code, and only keep one copy of code for all the guild runs in a Guild Home.

I am wondering if there is a way to do so. One way I can imagine is to package all my Python code and install it to the virtual environment, rather than keep them in my root directory, thus guild has nothing to copy. I would like to know if this is a good idea.

Thanks.

Great questions!

I think your idea of using installed package to meet your Python requirements is a good one. Consider installing using the -e/--editable option to pip install. This installs a package to the Python env that refers to your project source code. Changes you make to your local project are reflected whenever your package is loaded from Python.

Something like this, from your project dir:

pip install -e .

To disable source code copies for an operation, use the sourcecode operation attribute like this:

train:
  sourcecode: no

You can confirm that Guild doesn’t copy any source core using the --test-sourcecode option with the run command:

guild run train --test-sourcecode

You can also check a run using ls --sourcecode:

guild ls --sourcecode  # should be an empty list

All of that said, actual source code should not take up that much space (esp compared to data sets, generated models, images, etc.) You might double-check that you’re copying only what you need for an operation — again use --test-sourcecode to see what’s being copied. If there are big files in that list that you don’t need, you can exclude them this way:

train:
  sourcecode:
    - exclude: <pattern>
    - exclude: <pattern>

Hi Garrett, I have tried your suggestions and it works great! Thank you!