Scalar not saved if pipeline is used

mislav · September 10, 2020, 11:11am

Here is my guild file:

MislavSag/trademl/blob/master/guild.yml

- config: model-base
  resources:
    prepared-data:
      - operation: prepare-data

- operations:
    prepare-data:
      main: trademl.modeling.prepare
      flags-import: all
      flags:
        input_path:
          description: Path to read data from. 
          arg_name: input_path
          type: string
          default: D:/market_data/usa/ohlcv_features
        output_path:
          description: Main path where to save output
          arg_name: output_path
          type: string
          default: D:/algo_trading_files

This file has been truncated. show original

If I run prepare or random-forest operation it saves the scalars.

But if I run the pipeline pipeline-rf-opt that includes prepare and random-forst as step, it doesn’t save scalars. I call it like this:

guild run pipeline-rf-opt \
  data-include_ta=1 \
  data-label_tuning=0 \
  data-label=[day_5] \
  data-pca=0 \
  data-tb_volatility_lookback=[50] \
  data-tb_volatility_scaler=1.0 \
  data-correlation_threshold=0.95 \
  data-scaling='none' \
  random-input_data_path='D:/algo_trading_files' \
  random-forest-depth=4 \
  random-forest-maxf=10 \
  random-n_estimators=350 \
  random-min_weight_fraction_leaf=0.1

It is just one run.

What could be the reason it doesn’t save scalars?

But it saves flags.

garrett · September 10, 2020, 3:33pm

You can check to confirm that the step runs for pipeline-rf-opt are linked as expected:

guild ls -o pipeline-rf-opt

You should see directories for prepare-data and random-forest:train. These are links to the step runs. Guild uses these to traverse to the TF summary (event) files where the scalars are saved. These should be rolled up so they appear when you run:

guild runs info -o pipeline-rf-opt

I’ve confirmed this is working as expected on a sample pipeline. If you’re seeing something different we can troubleshoot further.

mislav · September 12, 2020, 8:02am

@garrett, here is to output of the guild command you posted above:

(base) PS C:\Users\Mislav\Documents\GitHub\trademl> guild ls -o pipeline-rf-opt
C:\ProgramData\Anaconda3\.guild\runs\a54248ea3eb449a7a4d34742cb554231:
  prepare-data
  random-forest_train
(base) PS C:\Users\Mislav\Documents\GitHub\trademl>

And for second command:

(base) PS C:\Users\Mislav\Documents\GitHub\trademl> guild runs info -o pipeline-rf-opt
id: a54248ea3eb449a7a4d34742cb554231
operation: pipeline-rf-opt
from: C:\Users\Mislav\Documents\GitHub\trademl\guild.yml
status: completed
started: 2020-09-10 10:47:54
stopped: 2020-09-10 10:49:21
marked: no
label: data-correlation_threshold=0.95 data-include_ta=1 data-label=day_5 data-label_tuning=0 data-lookforward=240 data-pca=0 data-scaling=none data-tb_volatility_lookback=50 data-tb_volatility_scaler=1.0 random-class_weight=balanced_subsample random-forest-depth=4 random-forest-maxf=10 random-input_data_path=D:/algo_trading_files random-min_weight_fraction_leaf=0.1 random-n_estimators=350
sourcecode_digest: ab24b5d70397046e7839099d287466bf
vcs_commit: git:68b7932fa199927ab461df76757fe9c2f410bfef*
run_dir: C:\ProgramData\Anaconda3\.guild\runs\a54248ea3eb449a7a4d34742cb554231
command: c:\programdata\anaconda3\python.exe -um guild.steps_main
exit_status: 0
pid:
steps:

  isolate-runs: no
  needed: yes
  run: prepare-data include_ta=${data-include_ta} label_tuning=${data-label_tuning} label=${data-label} tb_volatility_lookback=${data-tb_volatility_lookback} tb_volatility_scaler=${data-tb_volatility_scaler} correlation_threshold=${data-correlation_threshold} pca=${data-pca} scaling=${data-scaling}


  isolate-runs: yes
  needed: yes
  run: random-forest:train input_data_path=${random-input_data_path} max_depth=${random-forest-depth} max_features=${random-forest-maxf} n_estimators=${random-forest-maxf}  n_estimators=${random-n_estimators} min_weight_fraction_leaf=${random-min_weight_fraction_leaf}

flags:
  data-correlation_threshold: 0.95
  data-include_ta: 1
  data-label: day_5
  data-label_tuning: 0
  data-lookforward: 240
  data-pca: 0
  data-scaling: none
  data-tb_volatility_lookback: 50
  data-tb_volatility_scaler: 1.0
  random-class_weight: balanced_subsample
  random-forest-depth: 4
  random-forest-maxf: 10
  random-input_data_path: D:/algo_trading_files
  random-min_weight_fraction_leaf: 0.1
  random-n_estimators: 350
scalars:
(base) PS C:\Users\Mislav\Documents\GitHub\trademl>

So, there are no scalars. I am not sure what can be the reason. Here are the 2 script I use in steps:

github.com

MislavSag/trademl/blob/master/trademl/modeling/prepare.py

from pathlib import Path
import os
import numpy as np
import pandas as pd
from numba import njit
import matplotlib.pyplot as plt
import matplotlib
import sklearn
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
import mlfinlab as ml
from mlfinlab.feature_importance import get_orthogonal_features
import trademl as tml
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import mfiles
matplotlib.use("Agg")  # don't show graphs because thaty would stop guildai script

This file has been truncated. show original

github.com

MislavSag/trademl/blob/master/trademl/modeling/train_lstm.py

from pathlib import Path
from datetime import datetime
import numpy as np
import pandas as pd
from numba import njit
import matplotlib.pyplot as plt
import matplotlib
import json
import sys
import os
import sklearn
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import kerastuner as kt
import mlfinlab as ml
import trademl as tml
from tensorboardX import SummaryWriter
matplotlib.use("Agg")

This file has been truncated. show original

mislav · September 15, 2020, 9:40am

@garrett, I have just discovered how the pipeline works. It saves scalars in separate folders (in my case prepare and random_forest_train). I thought it would save everything in the pipeline folder. I am not sure how can I know which parameters I used in the prepare step if I inspect result in random forest operation.

garrett · September 15, 2020, 2:39pm

I see guild runs info is not helpful in this case. It should show step run IDs at least so you can further inspect them.

Your best bet for this I think is to use guild compare with a range selector to show the pipeline and its step runs. This assumes the pipeline and steps ran in isolation — i.e. there aren’t any other runs interleaved.

Something like this:

guild compare 1:4

Assuming your pipeline has three steps and was the last thing to run, this would show the flag values for each of the steps.

I think Guild compare could support a --show-steps option that implicitly selects the steps for a pipeline. That way you could run guild compare --show-steps <pipeline run>.

It’d also be good to show step info in guild runs info.

garrett · September 15, 2020, 2:46pm

@mislav would you mind opening an issue for this problem? It’s a general problem that I’d describe as “Hard to view pipeline results as a whole”. If that doesn’t capture what you think the issues are, feel free to use whatever title you think is best. With an issue we can track progress on the solution.

mislav · September 15, 2020, 5:55pm

I have opened the issue here: https://github.com/guildai/guildai/issues/238

ćMayvbe you have a quick fix for 3. That’s what I encounter right now

garrett · September 16, 2020, 1:00pm

I can’t recreate the behavior where Guild mistakenly states “the following runs match this operation” for different flag values. The matching runs are listed so it should be straight forward to verify the set of flag values. If Guild is stating that two runs with different flag values are the same, that’s a bug.

I assume you’re using --needed for some other reason. If not, just omit and this problem goes away. If you must use --needed then I think one approach is to use an additional flag to differentiate runs that are truly different, even though they have the same flag values. For example:

guild run op a=1 b=2 seq=1 --needed

and:

guild run op a=1 b=2 seq=2 --needed

Topic		Replies	Views
Scalars not getting saved Troubleshooting	5	577	January 26, 2021
Guild does not catch scalar outputs when formatting values Troubleshooting	1	373	June 21, 2022
How can I export all scalar values for all steps? General	4	1142	November 5, 2020
Issues with Guild file - output-scalars and sourcecode Troubleshooting	8	781	July 5, 2020
Get Started: Create a Guild File Get Started	0	5012	June 7, 2020

Scalar not saved if pipeline is used

Related Topics