Torch.multiprocessing.spawn fails

I have some code that works standalone, but fails when run from guild. The offending line is:

torch.multiprocessing.spawn(main_worker, nprocs=n_gpus, args=(n_gpus, args))

and the complaint is:

  [...] 
  File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/lib/python3.10/multiprocessing/spawn.py", line 183, in get_preparation_data
    main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: 'dict' object has no attribute '__spec__'

Does anyone have any tips? I’m not sure I really understand what’s failing in the spawn call… Thanks!

I’m sorry you’re running into this! I created an issue resolution doc that easily reproduces this. I’ll spend some time looking into it.

Still investigating but there is a work-around - if you your script using Python using Guild’s exec spec this way:

Note that this form doesn’t support Python global variable based flags - you’d need to either pass command line arguments along or use config files.

Still looking into the underlying issue but I wanted to get you a workaround sooner than later.