Hi Garret and fellow users of Guild AI,
I’m a prospective user; I had a question that relates to Exporting guild runs from remote sever.
My current workflow looks something like this:
- Prep data on machine A
- [Map step] Download prepped data on each of the virtual machines 1, 2, …, n, and do embarrassingly parallel things with this data. In particular, since the VMs here are google cloud TPU VMs, this is done by calling the gcloud ssh client and passing the appropriate start and end indices of the data for processing to a python script on each of the VMs.
- [Reduce step] Concat the processed data and maybe do some further processing on machine B, where machine B need not even be in the same network as machine A.
I’m wondering what the best way to incorporate Guild AI into this presumably not-uncommon workflow is; in particular, I would like to minimize the bookkeeping and automate things as much as possible, since I’ll want to repeat all of this over and over again several times with different rows of the source data. (And note, again, that the machine on which the reduction step is done may not be the same as any of the other machines in the previous steps; indeed, it may not even be on the same network.)
From the docs, and from looking at the forums, it looks like one way would be to
guild push the processed output to an external storage thing like a GCS bucket, and then
guild pull from the GCS bucket (or maybe
guild copy?) on the reduction step. To make things more automated, the reduction step could be automatically triggered when the mapping stage successfully finishes.
And I guess another way might be to use something like Ray to orchestrate the whole thing from one VM, in which case the guild part of the picture will be a lot simpler.
Does this sound right to you? Apologies in advance for the long post; I’ll be very happy to share my code / write up a mini-tutorial if I manage to get this working.