I am using guildai for some time now. I have developed few ML models where all models accept already preprocessed data: X_train, y_train, X_test, y_test. Than random forest, lightgbm and xboost operations accepts this same data an fine tune the model.
I suppose most of people (90%) use the same pipiline, so I was thinking maybe it would bee good to set up some kind of template fot this models. We all use same flags when we use random forest model, only search space can be different. So I was thinking it would be good to incorprate something like blueprints or temaplates for most popular models. For exampe, here is my random forest model:
- model: random-forest extends: meta-model description: Random forest model operations: train: description: Trainer for random forest main: trademl.modeling.train_rf # Python module when running the operation requires: op-prepare sourcecode: - include: '*.py' needed: no flags‑import: all flags: num_threads: arg_name: num_threads description: Number of threads to use in mlfinlab multhithread function min: 1 max: 32 sample_weights_type: description: Sample weights to use in training arg_name: sample_weights_type type: string default: 'returns' choices: [returns,time_decay,none] cv_type: description: type of cv arg_name: cv_type type: string default: 'purged_kfold' choices: ['purged_kfold'] cv_number: description: Number of CV folds to use in CV arg_name: cv_number min: 1 max: 20 max_depth: description: Maximum depth for the tree in random forest algorithm arg_name: max_depth min: 1 max: 10 max_features: description: maximum number of featurs in random forest arg_name: max_features min: 1 max: 250 n_estimators: description: Number of estimators (decision trees) in random forest arg_name: n_estimators min: 1 max: 10000 min_weight_fraction_leaf: description: TODO arg_name: min_weight_fraction_leaf min: 0 max: 1 class_weight: description: sklearn class_weight argument arg_name: class_weight type: string default: 'balanced_subsample' choices: ['balanced','balanced_subsample']
It can save lots of time when you already have flags defined. Even, better, there can be already reasonable search spaces for the method. In the end, there can be a pipeline which combines all models and do some kind of AutoML. Since, the steps are almost the same, maybe it would be good to have one template guild file with defined flags.
I know this is not the main goal of guildAI, but I was thinking it would be great if there would be one big guildai file with many models with some default values. That would save lots of time to developers.