After defining a guild operation, when I try running, I am getting the following error ERROR: [guild] trial <RUNHASH> exited with an error (see log for details).
Presumably something went wrong in the run but I have no idea what. And I am not sure where to look for the log. If I open guild view and look for the log there, there is nothing there. Which might be expected becasue that’s supposed to be the log of that the program outputs on the terminal. But then, I am still not sure where to look for the actual error log that the error message is talking about.
The operation itself may not be logging any error information. Guild doesn’t actually know if there’s error content in the log — that message points you there but if nothing is logged, there’s nothing to see.
You can see the exit code for the trial by running guild runs info TRIAL_RUN_ID and look for exit_status. It should be non-zero, which indicates an error.
You can see the output generated by the trial using guild cat --output TRIAL_RUN_ID. I’m guessing that output does not contain any information about the error.
Assuming that the script is failing but not showing any details, you can re-run the trial to try and recreate the problem. Use guild run --proto TRIAL_RUN_ID. This will generate a new run using the same source code and flags.
Start with those steps and please update here if you’re unable to resolve the issue. We can dig in further once we have more info.
Hi,
I faced the issue which occurs even before the actual execution of the main() function of my training code and this message ERROR: [guild] trial TRIAL_RUN_ID exited with an error (see log for details) is shown.
I tried all of the above and only guild run --proto TRIAL_RUN_ID showed the following error msg:
Resolving config:resources/config/config.toml dependency
guild: run failed because a dependency was not met: could not resolve 'config:resources/config/config.toml' in config:resources/config/config.toml resource: error loading config from /media/data/resources/config/config.toml: unsupported file type for '/media/data/resources/config/config.toml'
Assuming this is something to do with ‘config’ keyword in the ‘requires’ field of guild.yml file, changing it to ‘file’ keyword worked. Maybe the ‘config’ keyword has some special usage which I might have missed or aware of.
That’s a good point. The output generated by Guild about dependency resolution errors will not appear in run logs but is shown on the console. In this case you need to scan console output for these messages. Again, Guild only knows when a trial fails by its exit code. It doesn’t otherwise know why. That information should appear in the output. If the script doesn’t show an error message, there’s no information to go off of.
config is quite different from file. When you use config, you specify a JSON or YAML file that Guild uses to generate a new file containing flag values. Guild does not support toml format, so the error message is accurate.
This is covered in more detail here: Dependencies.