Storing the guild artifacts on s3 using `guild run -r s3-dev`

karthick · November 12, 2020, 5:08pm

I wanted to store all my runs in s3 bucket directly instead of the local guildai home directory. So, I configured my ~/.guild/config.yml in the following way which I thought will setup my s3 remote location and I can automatically save the sourcecode and artifacts that I save locally into s3 bucket.

remotes:
  s3-dev:
    type: s3
    description: Production runs
    bucket: cortex-model-data
    region: eu-central-1

and I have my guild.ymlas follows

- model: AlwaysPredictMean
  description: A dummy model which always predicts mean
  operations:
    train:
      description: Training Pipeline Sample Code
      main: training/train
      flags-import: all
      output-scalars: '(\key): (\value)'

when I run the script using guild run --remote s3-dev , I get the following error message

± |feature/guildai U:1 ✗| → guild run -r s3-dev
You are about to run AlwaysPredictMean:train on s3-dev
  comment: Description for a given training run
  config: training/config/example.yml
  data: tests/test_df.csv
  epochs: 10
  model_class: training.example_model::AlwaysPredictMean
  use_case: example_use_case
Continue? (Y/n) y
guild: remote 's3-dev' does not support this operation

Can someone let me know what exactly is the problem and why can’t I use guild.ai to store in the specified s3. Thanks

garrett · November 12, 2020, 5:57pm

S3 remotes are only for file storage. You can’t run anything in S3. If you want to run on a remote server, which is what you’re asking for with the --remote option, you need either an ssh or ec2 remote type.

If you want to run your operation locally, omit the --remote option — you’ll get runs in your current Guild environment. Then use guild push s3-dev to copy those runs to S3.

karthick · November 13, 2020, 8:52am

Thanks a lot. Now it is more clear. I misunderstood that if I run a script with a remote s3 config, my runs (including sourcecode and artifacts) are automatically saved to s3. I will do it manually by using guild push s3-dev

Thanks also for building this nice tool.

garrett · November 13, 2020, 3:50pm

I could see the value of automatically sync’ing with a remote env during and especially after a run. Guild does not currently support this. I could see an enhancement to Guild along the line of --push-to-remote or --push-on-success that does this. That’s a good idea. Though that poor run command already has quite a few options - it’s going to evolve into it’s own language grammar Still, I like the thinking!

In defense of Guild’s “separation of concerns” Tao, you can accomplish what you’re looking for this way:

guild run <options> && guild push s3-dev

This works on shells that support &&, which will execute the second command only when the first command succeeds (exit code of 0).

If you wanted to push regardless of the result, use:

guild run <options>; guild push s3-dev

Note this will push all runs, not just the latest. As Guild uses rsync (or similar) protocols, this is efficient. But you may want to just push the latest. In that case add the 1 argument to the push command. This tells Guild to only push the latest run (i.e. the run with index 1).

Wait, there’s more!

If you would like to always sync runs to your S3 bucket, you could create a repeating command guild push that runs, say, every 10 minutes, 30 minutes, etc. Most POSIX systems offer cron for this, but there are myriad ways to run scheduled commands. Now this is moving you into “sys op” territory, which you might not want enter - beware there be dragons But such is life. If this logic was moved into the run command you’d need to worry about command failure (e.g. SIGKILLs) or system failure (e.g. batter/power loss, etc.) Using something like cron is a nice separation of concerns because cron can back-fill on various failures to complete your backups even when Guild or the system unexpected crashes. E.g. your system loses power during a long training run where you have various interim checkpoints. You restart, cron runs automatically to backup your partial runs. You can then restart the run using guild run --restart <run ID> and be on your merry way, letting cron run every so often to refresh the backup. I don’t think this scheme is terribly complicated yet it’s quite robust.

karthick · November 16, 2020, 6:18am

Thanks garrett, I understand the concern. Coming from mlflow where you can set_tracking_uri to s3 remote, I thought guild also has this option. Having the artifacts stored in a remote s3 bucket makes it easier to collaborate in a small team and track the progress. But after looking at the number of options available for guild run and guild push, it makes sense to have these two functions separate.

garrett · November 16, 2020, 1:26pm

I think it’s a great idea and one more option to run isn’t a problem. I opened an issue on GitHub to track progress this.

Topic		Replies	Views
How do you use GuildAI with Slurm/remote jobs? General	9	1457	June 23, 2024
Runs vs storing models General	1	424	November 25, 2020
Tracking source code that is a python package Troubleshooting	2	343	February 26, 2022
Guild runs on remote not found Troubleshooting	2	476	July 27, 2021
Remotes Cheatsheet Cheatsheets	0	1422	June 13, 2020

Storing the guild artifacts on s3 using `guild run -r s3-dev`

Related topics