Runs vs storing models

lambdaofgod · November 25, 2020, 11:37am

What is the best practice for storing models resulting from different runs?

I made a pipeline that after training stores model in appropriate file in models directory. In my guild.yml file I added models in requires section.

That seems to result in runs overwriting files in models directory.

I’ve checked models and packages option for guild file, but it still doesn’t answer my question - I don’t see what to use to get different artifacts from different runs.

garrett · November 25, 2020, 5:55pm

Excellent question — so much so that I created a detailed example to help answer.

TLDR; you need to either use target-type: copy to avoid linking to the upstream directories, or, IMO better, explicitly control the inputs and outputs to your operations to avoid accidentally overwriting upstream run files. The example link above shows this in detail.

This case underscores a flaw in Guild, which is that is doesn’t do anything to prevent this sort of accident. Guild needs to set read-only flags on run-generated files as a minimal measure of protection. This is on the roadmap but I’ll make sure it is bumped in priority as this really bothers me, as it should everyone. I’m sorry you ran into this.

I’d answer here but the example I think shows in detail all the ins and outs and provides working/testable code so I’ll point you there.

Topic		Replies	Views
An effective way to export some of the data generated in a run to a designated location? General	2	573	January 15, 2021
Guild steps and pipeline - reuse same run General	9	1689	July 2, 2021
Question about project structure General	17	2128	June 19, 2020
Guild Dash Board General	4	714	September 21, 2020
Storing the guild artifacts on s3 using `guild run -r s3-dev` Troubleshooting	5	818	November 16, 2020

Runs vs storing models

Related topics