Torch.multiprocessing.spawn fails

chris · October 16, 2022, 8:08pm

I have some code that works standalone, but fails when run from guild. The offending line is:

torch.multiprocessing.spawn(main_worker, nprocs=n_gpus, args=(n_gpus, args))

and the complaint is:

  [...] 
  File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 183, in get_preparation_data
    main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: 'dict' object has no attribute '__spec__'

Does anyone have any tips? I’m not sure I really understand what’s failing in the spawn call… Thanks!

garrett · October 17, 2022, 5:38pm

I’m sorry you’re running into this! I created an issue resolution doc that easily reproduces this. I’ll spend some time looking into it.

garrett · October 17, 2022, 5:58pm

Still investigating but there is a work-around - if you your script using Python using Guild’s exec spec this way:

github.com

guildai/issue-resolution/blob/93de41a7741148e838e8c65e9b74b6854d11ad49/my.guild.ai-929-torch-multiprocessing-spawn-fails/guild.yml#L1-L2


      
          test-exec:
            exec: python .guild/sourcecode/test.py

Note that this form doesn’t support Python global variable based flags - you’d need to either pass command line arguments along or use config files.

Still looking into the underlying issue but I wanted to get you a workaround sooner than later.

Topic		Replies	Views
Guild Init Errors on Windows 10 Troubleshooting	0	426	August 13, 2021
__main__ has no attribute __spec__ pytorch-lightning multiGPU Troubleshooting	5	457	December 14, 2022
Error while publishing runs Troubleshooting	2	483	May 7, 2021
Opening source file causes File not found exception Troubleshooting	1	272	October 7, 2022
Guild check error Troubleshooting	6	545	November 30, 2020

Torch.multiprocessing.spawn fails

Related Topics