When using guild to run experiments on multiple GPUs with pytorch-lightning I get the following error:
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/main.py", line 219, in <module>
10/21/2022 5:54:52 PM main()
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/main.py", line 95, in main
10/21/2022 5:54:52 PM imputed_data = get_imputation_logic(args)(args, data)
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/task_logic/ae_imputation.py", line 119, in ae_imputation_logic
10/21/2022 5:54:52 PM ae_imputer = create_autoencoder(args, data, settings)
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/task_logic/tuner.py", line 69, in create_autoencoder
10/21/2022 5:54:52 PM ae_imputer.fit(data)
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/models/ap.py", line 148, in fit
10/21/2022 5:54:52 PM self._fit(data)
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/.guild/runs/4d62c69c236641b8b2d384bed79b64de/.guild/sourcecode/autopopulus/models/ap.py", line 166, in _fit
10/21/2022 5:54:52 PM self.trainer.fit(self.ae, datamodule=data)
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
10/21/2022 5:54:52 PM self._call_and_handle_interrupt(
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
10/21/2022 5:54:52 PM return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 92, in launch
10/21/2022 5:54:52 PM self._call_children_scripts()
10/21/2022 5:54:52 PM File "/home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 109, in _call_children_scripts
10/21/2022 5:54:52 PM if __main__.__spec__ is None: # pragma: no-cover
10/21/2022 5:54:52 PM AttributeError: 'dict' object has no attribute '__spec__'
So I went into the file /home/davina/mambaforge/envs/ap/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/subprocess_script.py", line 109, in _call_children_scripts
and changed if __main__.__spec__ is None
to if True
as a stopgap, and it seemed to work. For some reason __main__
is a dictionary.
This isn’t the exact same problem, but I happened to find a somewhat relevant problem here.