Tensorboard taking long to startup

I’m getting this warning:

WARNING: Guild took 9.56 seconds to prepare runs. To reduce startup time, try running with ‘–skip-images’ or ‘–skip-hparams’ options or reduce the number of runs with filters. Try ‘guild tensorboard --help’ for filter options.

Adding skip-images or skip-hparams does not help.

I suspect this happens because I have a resource (symbolic link) to a directory with a lot of files. Is it possible to configure guild tensorboard to ignore this directory?

I found the -O option in: Command: tensorboard
but adding -O logdir=tb did not work (had to abort because it never finished).

Any ideas?

Hi @samedii welcome to the new site!

There’s no easy workaround that I can think of. To me this behavior is arguably a bug. Guild is following symlinks to find run files that might be used for various TensorBoard plugins (e.g. projections, etc.) and I don’t understand why.

I’ll spend some time investigating and look at changing this this. There may be a good reason for the current behavior.

Either way I’ll address this in master. 0.7 is frozen - rc11 is the last release candidate barring some world ending bug that’s discovered.

Are you able to compile and run from source?

Hello :slight_smile:
I see, thanks for checking! I don’t think I had any trouble running from source when I did so previously.

I confirmed that the issue was the symbolic link. When I removed it, guild started tb very quickly.

I think this issue would be solved for my specific case if I could specify where to look for tb-logs in the guild.yml. I could maybe use -O logdir=dir/to/guild/runs/id/tb already but that’s quite a hassle

-O is meant to pass options along to TensorBoard without any knowledge of what’s being passed. I think logdir needs to be explicitly ignored (with warning message) as Guild takes over the function of setting up and specifying the logdir that TensorBoard sees.

I see what you’re getting at. I agree this should be in the Guild file. I hate to complicate things, but given the flexibility of the tool, where the script can put files anywhere it wants, I think the spec would have to follow the line of sourcecode and use Guild’s file select spec.

I think maybe this interface?

op:
  tensorboard:
    logdir: relpath-to-tb-files

In this case logdir is a fully supported file select spec like that used for sourcecode.

1 Like

Yes that would work well in my case and I think most people can easily adapt their code to it if they have these kinds of issues

1 Like

I just applied this commit:

If you grab the latest from master that issue should be resolved for you. Just make sure you’re using Guild from source.You can run guild check and look for guild_install_location to confirm it’s running from the repo and not from an installed package. (You may need to run hash-r to clear your shell’s cached location of the guild exe.)

3 posts were split to a new topic: Issues with Guild file - output-scalars and sourcecode