__main__ has no attribute __spec__ pytorch-lightning multiGPU

When using guild to run experiments on multiple GPUs with pytorch-lightning I get the following error:

10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/main.py", line 219, in <module>
10/21/2022 5:54:52 PM	main()
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/main.py", line 95, in main
10/21/2022 5:54:52 PM	imputed_data = get_imputation_logic(args)(args, data)
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/task_logic/ae_imputation.py", line 119, in ae_imputation_logic
10/21/2022 5:54:52 PM	ae_imputer = create_autoencoder(args, data, settings)
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/task_logic/tuner.py", line 69, in create_autoencoder
10/21/2022 5:54:52 PM	ae_imputer.fit(data)
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/models/ap.py", line 148, in fit
10/21/2022 5:54:52 PM	self._fit(data)
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/models/ap.py", line 166, in _fit
10/21/2022 5:54:52 PM	self.trainer.fit(self.ae, datamodule=data)
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
10/21/2022 5:54:52 PM	self._call_and_handle_interrupt(
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
10/21/2022 5:54:52 PM	return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
10/21/2022 5:54:52 PM	self._call_children_scripts()
10/21/2022 5:54:52 PM	File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 109, in _call_children_scripts
10/21/2022 5:54:52 PM	if __main__.__spec__ is None: # pragma: no-cover
10/21/2022 5:54:52 PM	AttributeError: 'dict' object has no attribute '__spec__'

So I went into the file /home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 109, in _call_children_scripts and changed if __main__.__spec__ is None to if True as a stopgap, and it seemed to work. For some reason __main__ is a dictionary.

This isn’t the exact same problem, but I happened to find a somewhat relevant problem here.

Grab the latest version of Guild, 0.8.2. This should be fixed in that version. There was a regression in 0.8.1.

1 Like

I am having problems with 0.8.2. Refreshing flags actually runs the program, and additionally, guild.plugins.import_argparse_flags_main actually takes up a lot of the %CPU. It also does not properly parse the flags. I downgraded back to 0.8.1 and the problem went away.

Shoot - sorry about that! Do you have a short example or some config that reproduces this?

Apologies for the late reply. Caught a nasty cold. I tried to create a MWE to trigger this problem but I got swamped with other work and wasn’t able to get one working for you. I will try working on this over the next few days and get back to you.