I am using guildai for some time now. I have developed few ML models where all models accept already preprocessed data: X_train, y_train, X_test, y_test. Than random forest, lightgbm and xboost operations accepts this same data an fine tune the model.
I suppose most of people (90%) use the same pipiline, so I was thinking maybe it would bee good to set up some kind of template fot this models. We all use same flags when we use random forest model, only search space can be different. So I was thinking it would be good to incorprate something like blueprints or temaplates for most popular models. For exampe, here is my random forest model:
- model: random-forest
extends: meta-model
description: Random forest model
operations:
train:
description: Trainer for random forest
main: trademl.modeling.train_rf # Python module when running the operation
requires: op-prepare
sourcecode:
- include: '*.py'
needed: no
flags‑import: all
flags:
num_threads:
arg_name: num_threads
description: Number of threads to use in mlfinlab multhithread function
min: 1
max: 32
sample_weights_type:
description: Sample weights to use in training
arg_name: sample_weights_type
type: string
default: 'returns'
choices: [returns,time_decay,none]
cv_type:
description: type of cv
arg_name: cv_type
type: string
default: 'purged_kfold'
choices: ['purged_kfold']
cv_number:
description: Number of CV folds to use in CV
arg_name: cv_number
min: 1
max: 20
max_depth:
description: Maximum depth for the tree in random forest algorithm
arg_name: max_depth
min: 1
max: 10
max_features:
description: maximum number of featurs in random forest
arg_name: max_features
min: 1
max: 250
n_estimators:
description: Number of estimators (decision trees) in random forest
arg_name: n_estimators
min: 1
max: 10000
min_weight_fraction_leaf:
description: TODO
arg_name: min_weight_fraction_leaf
min: 0
max: 1
class_weight:
description: sklearn class_weight argument
arg_name: class_weight
type: string
default: 'balanced_subsample'
choices: ['balanced','balanced_subsample']
It can save lots of time when you already have flags defined. Even, better, there can be already reasonable search spaces for the method. In the end, there can be a pipeline which combines all models and do some kind of AutoML. Since, the steps are almost the same, maybe it would be good to have one template guild file with defined flags.
I know this is not the main goal of guildAI, but I was thinking it would be great if there would be one big guildai file with many models with some default values. That would save lots of time to developers.