Summary
This proposal seeks to address Guild’s requirement that an upstream run be available for an operation dependency. It proposes new operation attributes that control how Guild responds when it can’t find a suitable run for an operation dependency. fail-if-unresolved
may be set to false to prevent run failure and warn-if-unresolved
can be set to silence warnings when a dependency can’t be resolved.
This proposal is awaiting feedback
Problem
A run fails when Guild cannot resolve its dependencies. In most situations, this is desirable — the run should not proceed if a required resource is not available.
However, there are cases where a user wants required resource links when the resources exist and not otherwise. Such runs are designed to proceed when the resources are not available.
Consider a training run that make use of a previous training run (e.g. to learn from the previous result or to resume training with saved models). In this case, the operation might be defined like this:
train:
requires:
- operation: train
select:
- saved_model.*
If a previous training run doesn’t exist, the operation will fail. This prevents users from using this sort of self-referencing operation.
Proposed Approach
Guild will support a new operation attributes for a resource source: fail-if-unresolved
and warn-if-unresolved
. If fail-if-unresolved
is true (default) Guild generates an error when the source cannot be resolved. If false, Guild optionally logs a warning based on warn-if-unresolved
and proceeds with the run. If warn-if-unresolved
is true (default), Guild logs a warning when the resource source cannot be resolved. If false, Guild proceeds with the run without a warning.
To implement a training run that continues when an upstream required operation cannot be found (see above example) an operation can be configured as:
train:
requires:
- operation: train
fail-if-unresolved: false
select:
- saved_model.*
In this case, Guild attempts to resolve the dependency by finding a non-error run for train
. If it cannot find such a run, it logs a message indicating that it can’t resolve the dependency but continues nonetheless.
To suppress the warning message, the user could specify warn-if-unresolved: false
for the resource source.
Alternative Approaches
Do nothing
One Guild user proposed a work-around for this limitation:
While this is an ingenious workaround, it’s unintuitive and complicated compared to the optional
attribute. Doing nothing here is not an option.
Use a single optional
flag
The attribute fail-if-unresolved
is arguably a bit pedantic/wonky. A simpler optional
flag would be sufficient to address the target problem.
The example above would look like this:
train:
requires:
- operation: train
optional: true
select:
- saved_model.*
Drawbacks:
-
fail-if-xxx
is already a convention for options that specify whether or not Guild should continue with the run (e.g.fail-if-empty
). -
optional
does not control whether or not Guild logs a warning when the source cann’t be resolved. -
optional
is paradoxical for a requirement.