An effective way to export some of the data generated in a run to a designated location?

So I generate a bunch of results in multiple runs of the same model (I like to keep them in the run subdir because it helps me with output data versioning). However, when it comes to the time that I need to analyse them, I always have to write an extra bit of code to copy from the selected runs. Therefore, I am looking for more elegant solution to do so.

I also tried guild export but it either copy all resources or none and I still need to manually collect them from the exported run subdirs to the same directory. Furthermore, guild export of a pipeline will only copy the source-code, leaving behind the results generated in its steps.

So I used to do something like:

for runid in ID1 ID2..
do 
  mkdir -p /somewhere/else/${runid} 
  cp -rv `guild open ${runid} --cmd='echo' --path=relative/output/dir`* /somewhere/else/${runid}
done

I wonder if there’s a more elegant way to do this, which could involve something like guild cp RUNID --path=relative/paths LOCATION. Using guild open also feels a little bit odd in this scenario.

Yes, open is a nice hack there!

Guild supports select, which could be modified to support a --cmd option, similar to open but with access to env vars like $RUN_DIR to support something like this:

guild select --cmd "cp -a $RUN_DIR/relative/output/dir /somewhere/else/$RUN_ID"

This follows the pattern that find uses with --exec. In fact I could see renaming guild select to guild find.

Currently select only returns the first matching run but this would be modified (there’s a feature request for this).

This is not quite the same as a cp command — but it’s more versatile. E.g. see man cp for some of the options that Guild would have to potentially implement, and that across platforms.

1 Like

Thanks! That’s THE feature that I am looking for. I would definitely like to see that happen.

1 Like