Dynamically generated parameters in pipeline

aaronshifman · May 4, 2021, 1:43pm

Conceptually I have a two stage pipeline. Where the first stage generates a set of flags (“hyper-hyper parameters”). Then in the second stage I want to combine those with a set of hyper parameters to optimize. The challenge is that since they’re created dynamically… I can’t know ahead of time how many there are.

I can do it manually like this

guild run train x='[1,2,3] y='[1,2,3]'' @bigbatch.csv

What I would like to do is for the train step to use a generated bigbatch.csv from the upstream pipeline

I’ve attached what I think it should look like at the guild.yml level

train:
  description: Sample training script
  flags-import: all
  requires:
    - operation: bigbatch
bigbatch:
  description: make file bigbatch.csv

This gives me a symlink to the correct file called bigbatch.csv in the “train folder” after the guild train operation. However when I use the “@” batch notation the bigbatch.csv is taken from my cwd. Is there any way to reference batch parameters in the guild.yml?

garrett · May 4, 2021, 2:29pm

This is a bit of a hack but I think could be an approach that works for you:

gist.github.com

https://gist.github.com/gar1t/7a10da5eb950bc9855eac6207982b187

bigbatch.py

with open("bigbatch.csv", "w") as f:
    f.write("""x,y
1,2
3,4
""")

gistfile1.txt

bigbatch: {}

train-batch:
  exec: guild run train @bigbatch.csv -y
  requires:
    - operation: bigbatch
      select: bigbatch.csv
  sourcecode:
    dest: .
    exclude: '*.csv'

This file has been truncated. show original

guild.yml

bigbatch: {}

train-batch:
  exec: guild run train @bigbatch.csv -y
  requires:
    - operation: bigbatch
      select: bigbatch.csv
  sourcecode:
    dest: .
    exclude: '*.csv'

This file has been truncated. show original

There are more than three files. show original

Take a look and let me know if you have any questions or run into issues.

aaronshifman · May 4, 2021, 3:03pm

This looks really promising and I can get guild train-batch to run. My challenge is that I would like to be able to do something like…

guild train-batch x='[1,2,3]' (or any HPO-type syntax) to do the outer product of the bigbatch with the HPO.

garrett · May 4, 2021, 5:35pm

You can pass through flag values in a command spec using ${FLAG_NAME) so you could parameterize a list of values this way:

gist.github.com

https://gist.github.com/gar1t/4cdbe645df4924ca4522abec8511d99d

bigbatch.py

with open("bigbatch.csv", "w") as f:
    f.write("""x,y
1,2
3,4
""")

guild.yml

bigbatch: {}

train-batch:
  exec: guild run train @bigbatch.csv x="${x}" -y
  flags:
    x:
      required: yes
  requires:
    - operation: bigbatch
      select: bigbatch.csv

This file has been truncated. show original

train.py

x = 0
y = 0

print("x=%s" % x)
print("y=%s" % y)

Note though that when you run train-batch you need to quote the string arg to tell Guild the value is a string and not a list. Like this:

guild run train-batch x="'[5,6,7,8]'"

You are about to run train-batch
  bigbatch: ef0d8d741ea94bc8a1202f0815812b6e
  x: '[5,6,7,8]'
Continue? (Y/n)

Topic		Replies	Views
Import flags from dependency Troubleshooting	2	499	October 4, 2021
Hyperparameter Optimization Concepts	0	3242	June 12, 2020
Flags.yml already exists, skipping copy Troubleshooting	1	77	June 18, 2024
Batch eval runs over batch train runs via required resource? General	4	826	February 4, 2021
Can Guild take advantage of cached results? General	3	670	April 12, 2021

Dynamically generated parameters in pipeline

Related topics