Scalar not saved if pipeline is used

Here is my guild file:

If I run prepare or random-forest operation it saves the scalars.

But if I run the pipeline pipeline-rf-opt that includes prepare and random-forst as step, it doesn’t save scalars. I call it like this:

guild run pipeline-rf-opt \
  data-include_ta=1 \
  data-label_tuning=0 \
  data-label=[day_5] \
  data-pca=0 \
  data-tb_volatility_lookback=[50] \
  data-tb_volatility_scaler=1.0 \
  data-correlation_threshold=0.95 \
  data-scaling='none' \
  random-input_data_path='D:/algo_trading_files' \
  random-forest-depth=4 \
  random-forest-maxf=10 \
  random-n_estimators=350 \
  random-min_weight_fraction_leaf=0.1

It is just one run.

What could be the reason it doesn’t save scalars?

But it saves flags.

You can check to confirm that the step runs for pipeline-rf-opt are linked as expected:

guild ls -o pipeline-rf-opt

You should see directories for prepare-data and random-forest:train. These are links to the step runs. Guild uses these to traverse to the TF summary (event) files where the scalars are saved. These should be rolled up so they appear when you run:

guild runs info -o pipeline-rf-opt

I’ve confirmed this is working as expected on a sample pipeline. If you’re seeing something different we can troubleshoot further.

@garrett, here is to output of the guild command you posted above:

(base) PS C:\Users\Mislav\Documents\GitHub\trademl> guild ls -o pipeline-rf-opt
C:\ProgramData\Anaconda3\.guild\runs\a54248ea3eb449a7a4d34742cb554231:
  prepare-data
  random-forest_train
(base) PS C:\Users\Mislav\Documents\GitHub\trademl>

And for second command:

(base) PS C:\Users\Mislav\Documents\GitHub\trademl> guild runs info -o pipeline-rf-opt
id: a54248ea3eb449a7a4d34742cb554231
operation: pipeline-rf-opt
from: C:\Users\Mislav\Documents\GitHub\trademl\guild.yml
status: completed
started: 2020-09-10 10:47:54
stopped: 2020-09-10 10:49:21
marked: no
label: data-correlation_threshold=0.95 data-include_ta=1 data-label=day_5 data-label_tuning=0 data-lookforward=240 data-pca=0 data-scaling=none data-tb_volatility_lookback=50 data-tb_volatility_scaler=1.0 random-class_weight=balanced_subsample random-forest-depth=4 random-forest-maxf=10 random-input_data_path=D:/algo_trading_files random-min_weight_fraction_leaf=0.1 random-n_estimators=350
sourcecode_digest: ab24b5d70397046e7839099d287466bf
vcs_commit: git:68b7932fa199927ab461df76757fe9c2f410bfef*
run_dir: C:\ProgramData\Anaconda3\.guild\runs\a54248ea3eb449a7a4d34742cb554231
command: c:\programdata\anaconda3\python.exe -um guild.steps_main
exit_status: 0
pid:
steps:

  isolate-runs: no
  needed: yes
  run: prepare-data include_ta=${data-include_ta} label_tuning=${data-label_tuning} label=${data-label} tb_volatility_lookback=${data-tb_volatility_lookback} tb_volatility_scaler=${data-tb_volatility_scaler} correlation_threshold=${data-correlation_threshold} pca=${data-pca} scaling=${data-scaling}


  isolate-runs: yes
  needed: yes
  run: random-forest:train input_data_path=${random-input_data_path} max_depth=${random-forest-depth} max_features=${random-forest-maxf} n_estimators=${random-forest-maxf}  n_estimators=${random-n_estimators} min_weight_fraction_leaf=${random-min_weight_fraction_leaf}

flags:
  data-correlation_threshold: 0.95
  data-include_ta: 1
  data-label: day_5
  data-label_tuning: 0
  data-lookforward: 240
  data-pca: 0
  data-scaling: none
  data-tb_volatility_lookback: 50
  data-tb_volatility_scaler: 1.0
  random-class_weight: balanced_subsample
  random-forest-depth: 4
  random-forest-maxf: 10
  random-input_data_path: D:/algo_trading_files
  random-min_weight_fraction_leaf: 0.1
  random-n_estimators: 350
scalars:
(base) PS C:\Users\Mislav\Documents\GitHub\trademl>

So, there are no scalars. I am not sure what can be the reason. Here are the 2 script I use in steps:


@garrett, I have just discovered how the pipeline works. It saves scalars in separate folders (in my case prepare and random_forest_train). I thought it would save everything in the pipeline folder. I am not sure how can I know which parameters I used in the prepare step if I inspect result in random forest operation.

I see guild runs info is not helpful in this case. It should show step run IDs at least so you can further inspect them.

Your best bet for this I think is to use guild compare with a range selector to show the pipeline and its step runs. This assumes the pipeline and steps ran in isolation — i.e. there aren’t any other runs interleaved.

Something like this:

guild compare 1:4

Assuming your pipeline has three steps and was the last thing to run, this would show the flag values for each of the steps.

I think Guild compare could support a --show-steps option that implicitly selects the steps for a pipeline. That way you could run guild compare --show-steps <pipeline run>.

It’d also be good to show step info in guild runs info.

@mislav would you mind opening an issue for this problem? It’s a general problem that I’d describe as “Hard to view pipeline results as a whole”. If that doesn’t capture what you think the issues are, feel free to use whatever title you think is best. With an issue we can track progress on the solution.

I have opened the issue here: https://github.com/guildai/guildai/issues/238

ćMayvbe you have a quick fix for 3. That’s what I encounter right now

I can’t recreate the behavior where Guild mistakenly states “the following runs match this operation” for different flag values. The matching runs are listed so it should be straight forward to verify the set of flag values. If Guild is stating that two runs with different flag values are the same, that’s a bug.

I assume you’re using --needed for some other reason. If not, just omit and this problem goes away. If you must use --needed then I think one approach is to use an additional flag to differentiate runs that are truly different, even though they have the same flag values. For example:

guild run op a=1 b=2 seq=1 --needed

and:

guild run op a=1 b=2 seq=2 --needed