Hi,
Any plans on adding support for (\key): (\value)
output variables which are not necessarily numeric?
For scalars, Guild doesn’t support non-numeric because they’re numeric by definition in this case. You can log a variety of other data types using TF summaries.
What’s the use case you have in mind? I assume that you’d like to have a non-numeric value that your script generates (I’m guessing text) to show up in the comparison views?
Speculating a bit here as to your case, but a start might be to log text summaries. I added a text example to the TensorBoard example that shows how you can log text and it shows up in TensorBoard. Guild does not do anything with these summaries though (the way it does with scalar values).
Thanks for the answer, and for the added TensorBoard example. As you can see, I have avoided using the term “scalars” in my question exactly for the reason that scalar by definition is single value numeric variables
Regarding the scenario I had in mind, I did wish a non-numeric value will show up in the comparison views (and generated tables). Specifically, I can think of two relevant examples besides free form text.
-
An Enum value (I know I can use a numeric value here to represent that enum, but it would be much more readable and easier to further analyze (using python pandas DataFrame for example), if the value would already be presented by name.
-
An array of values
I don’t see much point in supporting dictionaries, as those can easily be spread across multiple keys.
What do you think?
I would also love to see this feature!
Just wished to add that by logging any string you can pretty much log anything that way, by allowing the users to parse the string in whatever way they want when coming to review the results. So by supporting a string would be able to log categorical variables, arrays of any type, dictionaries, and even more complex structures.
Thanks @emgong and @wheatdog — let’s figure out an approach here and get it into Guild!
Could you provide a specific example of something that you’re generating/calculating that you want to log as non-numeric? Are these values ever associated with training steps?
Or are these attributes of the run that you want to log?
An attribute, for example, might be a network architecture “lstm”. You might want to log this to show that whenever you view the run in a comparison.
A calculated value might be the term “dog” in a classification run. Rather than log an array index corresponding to dog — e.g. 2 — you prefer the human readable (English) word.
I think both examples you gave are very relevant. I have a more complicated scenario where each run is applied to a different class which is not necessarily reflected in the program parameters (flags), and I wanted to log the class name.
I would list the following specific scenarios as most interesting (all could be solved by adding a support for logging any string, and let the end user do the job of parsing and making sense out of the logged strings):
- Logging the code version (either git commit or just a local variable)
- Logging a categorical attribute (e.g. name of a model used)
- Logging a string attribute (e.g. URL from which a model is downloaded and used)
- Logging a categorical classification result using a general name rather than an integer (e.g. “dog” or “cat”)
- Logging an array of values (numerical or non-numerical)
Some of the above could be logged by mapping numbers to categorical values or constants. For the ones you cannot map to integers (like arrays or free-form strings), you could always create file artifacts that hold this data.
However, it would be so much easier to analyze the compare results if those values would already be in the run compare table in their string, human-friendly, representation
This is very helpful!
I think we’re talking about run attributes here. Guild makes extensive use of attributes and already supports showing them as columns in Compare and View. They’re also available via the ipy
interface, which is currently the official Python Guild API.
It’s currently possible to log these yourself, but that requires a bit of hacking. We can do better.
My first inclination is to add support for a new construct: output attributes. This follows the existing output scalars pattern. Output attributes would be configured under an output-attributes
operation attribute and follow the same scheme as output-scalars
.
Note that run attributes are not associated with a step.
Run attributes are saved as YAML encoded values. I think we’d want to support both logging as string values (default) and optionally support YAML encoded objects.
I hesitate to expose run attributes to casual logging ala (\key): (\value)
. If we supported a default pattern I feel like it ought to use an explicit pattern to signal to Guild that it’s a run attribute. For example:
# In train.py
print(".model_class: lstm")
print(".lstm_units: 5")
The use of the dot .
there hearkens to the use of the dot when specifying attribute columns.
guild compare -cc .model_class,.lstm_units
This dot-name pattern would be the default. You could always use output-attributes
to define other patterns. In that case it’s explicit so there’s no issue with accidental logging.
(We might need to require a naming prefix here to differentiate user-defined attributes from Guild defined attributes (e.g. @model_class
or somethign like that) in which case that prefix could safely be used in the default pattern.)
That all sounds good and makes sense to me if you are talking about run attributes - and indeed most of the examples I gave could be regarded as run attributes.
The last two examples I gave (classification category and an array of values) are much more like output scalars than output attributes (except they are not scalars), and are (or at least can be) associated with a step. So it may make sense to support output strings after all. However, I would also agree that the last two examples are probably the least usable use cases.
What do you think?
I think both cases make sense. A run attribute is something you know ahead of time about a run. They’re like flag values but they’re not user-configurable.
The step-related text I’d call a “summary”. Guild relies on the TensorFlow/TensorBoard scheme of summaries and uses the TF event files as its interface for so called “scalars”. What we’re talking about here is to support text summaries as well. That’s closer to the spirit your original idea.
One can imagine any summary type landing in a column — e.g. images, PR curves, etc. The information is available in the summary logs (TF event files). It’s a matter of showing it and making it useful for filtering, reporting, etc.
For Guild to add support for showing text summaries in Compare and View is quite a bit more involved than showing attributes (which it already does). But I think both cases are extremely valid and Guild should support both.
Followup note on this topic…
Another common problem/request is not knowing anything about upstream operations when viewing a downstream run. E.g. in Guild steps and pipeline - reuse same run there’s a need to see flag values used in upstream runs.
Upstream flags are not downstream flags. But it should be possible to show them as downstream attributes or text summaries.
Just to revive this thread, I believe Guild should support the display of text summaries from TF event logs in the same way it does scalars and flags. While not all text in a TF event log would be suitable for this, it should be possible for a user to specify the tags to display in a compare view/runs table.
Guild could also support logging text values via script output (e.g. an analog to output scalars).
E.g. a script:
loss = train()
print(f"loss: {loss}")
print("loss-function: mse") # Guild logs as a text summary
The obvious challenge is how to differentiate between tag-like text and text blobs, the later being the more likely use of a TF event text summary (e.g. generated text samples, predictions, etc.)
One approach might be to use a prefix in the tag (e.g. attr:loss-function
) when writing the summary. This would make attribute-like text summaries explicit and avoid accidental use of potentially very large text entries.
Another heuristic might be a maximum length.
The best heuristic would be an explicit list of tags. This could be saved as a run attribute (dynamically if output attributes are being written or as an operation definition).
One other use case for logging non-numeric outputs is when tracking a metric , e.g. loss or MSE, and the error becomes inf or nan. Right now I am printing some metrics as scalar outputs, and in cases where the output is inf, the metric doesn’t get logged at all since the printed statement is “MSE: inf”. It’s hard to tell when the output is inf and didn’t get logged, versus when I forgot to print the metric.