Question about project structure

I am little bit confused about my folder structure. I have a repo ther has 2 submodules. Inside one submodul I created another folder for guild AI experiments:

Should I initialize a guild here inside my submodule? Why would I want to use another enviroemnt inside my module?

What is recommenden way of doing it in guild?

In looking at your project, I recommend putting the Guild file at the project root, alongside setup.py. You need to change your main spec to specify the full module:

- model: random_forest
  operations:
    train:
      main: trademl.modeling.random_forest.train_rf

You’re free to put guild.yml anywhere - so where you have it is fine, but I think it’s better to have it in the project root.

  • Guild copies source code from the project directory (the directory where guild.yml is located). You can override this by specifying an alternative root directory, but it’s less complicated to just move the Guild file to the project root.

  • It’s easy to add new operations for different/new packages as needed when the config is at the root level.

  • A Guild file is intended to signal that a project is “Guild enabled”. The idea is that you can just add this file to an otherwise unchanged project and some nice functionality. By putting this file in the root, you make these features a bit more obvious I think.

  • As Guild commands look in the current directory for a Guild file, your workflow I think is simpler when you just cd to the project root, rather than have to change to various directories to run commands. This is sort of like git where the current directory determines (at least by default) when is being operated on.

OK, I added guild file to the root. Where should I save runs, also in the root? I have read in the docs that it is recommended to save runs in separate folder inside specific project (). How should I run commands than?
Sorry for all this questions, but I would like to setup everything right.
I have never seen the author of the package provide such detailed answers, thanks!

Thank you for your excellent questions!

Guild saves runs in environments. By default there’s a user-level environment at ~/.guild - so unless you’re otherwise specifying a different location, runs are located in ~/.guild/runs.

You can change the directory where runs are stored a few ways:

  • Set the GUILD_HOME environment
  • Specify the -H option immediate after guild when running any command
  • Activate a virtual environment

By default, Guild saves runs inside a virtual environment directory whenever that virtual environment is activated. This provides isolation within the environment of both software libraries (i.e. installed Python packages) and also Guild runs. You can change this behavior using either --no-isolate or --guild-home options if you create the virtual environment using guild init. If you create the virtual env another way, just set GUILD_HOME to whatever you want in the activation script.

You can see where the current environment is located by running guild check. Look for the guild_home value in the output.

Now with all that background, to your question.

As a matter of practice, I recommend using a virtual environment for every project. I typically run guild init within the project root, which creates a local venv directory containing the virtual environment as well as the .guild subdirectory, which contains the project runs. From there I run source guild-env to activate the environment. Run guild check to confirm that guild_home is pointing to the expected location under venv.

You’re free however to use virtualenv or the venv module or even conda to create an environment. Just activate it and Guild will start using that location to store runs. If you prefer to keep your runs in a different location, again just make sure the GUILD_HOME environment variable is set to that path.

Finally, there’s no way to tell Guild to save runs to a specific directory. You can only specify the Guild home path. Runs are always stored in runs within that path — there’s no way to change that behavior.

What if I initialize model now when the project is already defined and on github? Will there be some side effects?

By “initialize the model” I’m assuming here that you asking about initializing the environment using guild init. Please let me know if that’s incorrect.

When you run guild init the only side effect is the creation of a local subdirectory, which is named venv by default. You can change that location by specifying it as an argument — e.g. guild init myenv.

I would not include a directory like this in the git repo, so you ought to add venv or /venv to your project .gitignore. With the addition of that line to .gitignore there’s no other change to your project source code.

If you want to include runs in your repository, I would not try to include .guild/runs. There are a couple other ways to do this that are far better.

  • Use guild export to export only the runs you are interested in publishing. You can export those to a different directory in your project that is included in the git repo — e.g. runs, etc.

  • Use guild publish to generate Markdown formatted files that you can view in GitHub.

The publish command generates formatted output for runs whereas export literally copies (or moves, there’s an option to control this) a run directory from the Guild environment to the export location.

If I don’t save all runs in github repo, than I should save it in default .guild folder and export only runs I want to export, if get it right? But, why to initialize the guild in the repo root in the first place, if I will save runs in default folder?

You don’t need to use an environment. It’s fine to rely on the system level Python. In that case runs are stored in ~/.guild/runs. You can export them as needed into your project directory.

I personally like project-level environments because they isolate the project work. This isolation is both for software libraries and for runs. Any run-related commands are limited to runs in Guild home. This way you only see runs related to the project.

As far as putting the virtual environment inside the project, that’s totally subjective. You can create an environment anywhere you like. I do that because it’s easy for me to reason about as a project-specific env.

In the end I created an environment with guild init, but when I try to activate it with:
source guild-env /venv
I get error:

source : The term ‘source’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1

  • source guild-env .\venv\
  •   + CategoryInfo          : ObjectNotFound: (source:String) [], CommandNotFoundException
      + FullyQualifiedErrorId : CommandNotFoundException

If you’re running Windows, you need to run the activation script directly. It should be in venv/bin – e.g. venv/bin/activate.cmd or venv/bin/activate.bat.

Hm, I don’t see bin directory in venv, only guild, Lib, Scripts, and pyvenv:

I have used git init to create this virtual enviroment.

It’s in Scripts

I can’t see it in scripts ether:

It’s activate.bat. This a standard virtual environment.

nothing happens when I execute it:

I think I will remove venv in the end, use default guild

It looks like you’re activating the environment from within an activated base conda environment. Make sure you run the activate script from outside the conda environment.

Or if you want, create a new conda env for your project and use that. In that case you don’t use guild init. Just follow the conda inductions.

A post was split to a new topic: Problem setting up environment on Windows