Tensorboard logging twice + is slow

Just to add a bit to the excellent info @teracamo provides…

TensorBoard doesn’t really know about “runs” in the Scalars plugin. It enumerates unique directories that contain TF event files under it’s log directory. It calls them runs in the UI, but it has not idea what a run is. In fact, it’s quite common to log events under separate subdirectories for a run to help organize the layout in TB. E.g. you’ll see “train” and “validate” or “eval” subdirs used to separate scalars.

The reason you’re seeing two separate “runs” there is that there are TF event files landing in separate subdirectories. That’s confusing. It’d be better if TB used a term other than “runs”. Alas, that’s the way they present the info.

The somewhat odd appearance of <run dir>/.guild in this list is because Guild writes its TF event logs in a subdirectory .guild. This is to avoid possible collisions with any files that your script runs. As @teracamo says, it’s sometimes helpful to poke around this directory to see what Guild saves with your run. You don’t need to worry too much about it, but it’s there in plain view if you ever need to understand something in more detail.

Okay, to the problem at hand! As I see it there are three options to address the point of confusion:

  1. Don’t worry about it. It’s okay to have multiple subdirs in TB associated with a run. Look at the two runs and in your head say, “one run, one run” until the problem resolves itself :wink:

  2. If you’re already logging scalars, you don’t really need Guild’s output scalar support. You can disable it for a single operation this way:

op:
  output-scalars: no

Alternatively, use the operation-defaults model attr in “full format” Guild file.

- operation-defaults:
    output-scalars: no
  operations:
    op: ...

This eliminates the .guild entry in the runs list in TB. That’s simple enough but you’ll be responsible for logging scalars. Since you’re doing that already, I think this is a pretty good option.

  1. Write your summaries to <run dir>/.guild. This will consolidate the summaries you write with the summaries that Guild writes. I personally don’t like this option and would discourage it. I think your TF event files should land wherever you want them — root of the run dir or a subdirectory. That’s a pretty standard convention in TensorFlow land and writing to .guild is a bit unconventional.

I was hoping for an output-scalars attribute that let you write to a different directory but 0.7.0 doesn’t support this. I think that’d be a good option 4. Something like this:

- op:
  output-scalars:
    summary-path: .  # This is hypothetical - Guild 0.7.0 does not support this

Long winded responses but hopefully it gives you some useful background.