I would like to make prepare data file that is the same for several models.
In the documentation on pipelines (Pipelines) you show how to construct the pipeline with the prepare data file in the same model. But I would like to have prepare steps for several models (for example decision tree and random forest.)
I would like to know what is recommended way to do this inside guildai.
First thing that come to my mind is to have prepare file and source it from train script, but I am not sure if I can normaly call flags from that prepare file when running train operation from guild.
Do you want to run prepare-data once and use that one prepared data set for all models? Or you want to run separate prepare operations, each one creating a separate data set for use by each different model type?
The example shows a single data set that’s used by both train and test for a single model type. But these operations could just as easily be train-decision-tree and train-random-forest. The interface would be the same.
I don’t quite follow your last paragraph. From that it sounds like each model has its own prepare logic.
I see config for the first time.
Why do I need the config object in the first place? For example what if I remove config (and extend in models) and just define operations and models objects?
In above yml you define prepare-data operation 2 times, in config and than again in operations, I don’t see the reason for this?
In this case config is used to define shared resources. The prepare-data operation is defined once in both examples.
It’s subtle, but prepared-data (notice the difference in naming convention) is the name of a resource. This is spelled this way so that requires: prepared-data reads better. You can name it whatever you want.
I get:
ERROR: error in C:\Users\Mislav\Documents\GitHub\trademl\guild.yml: invalid value for operation ‘None’ ‘prepare’: expected a string or a mappinge[0m