Error with async

lucas · February 23, 2023, 3:25pm

Hi, I’m new to guild, so I’m sorry if I’m asking something trivial…

I’m trying to run a training operation, which it’s quite simple I guess: I have a training script (training-mrcnn.py) and a couple of dependencies for tht script (a couple of python classes and functions).
All the hyperparameters for configuration are managed by detectron2 (v0.6) and my script for now. So far, I have this guidl.yaml:

train:
  description: Sample training script
  main: train-mrcnn
  output-scalars: off
  requires:
    - file: 220928-01-200SynthBacksTM-blur-CUSTOM.yaml
    - file: ../../custom/rpnt.py
    - file: ../../custom/custom_coco_evaluation.py

But it seems it doesn’t even start, because of an error with the async library (neede by detectron2 → torch). Here’s the stack trace:

guild run train
Refreshing flags...
WARNING: cannot import flags from train-mrcnn.py: ModuleNotFoundError: No module named 'custom' (run with guild --debug for details)
You are about to run train
Continue? (Y/n)     
WARNING: Skipping potential source code file /home/lucas/Killme/guild-pipeline-t4-real/model/src/d2/experiments/220928-01-200SynthBacksTM-blur-CUSTOM/exp-01/inference/coco_intances_results_0000499.json because it's too big. To control which files are copied, define 'sourcecode' for the operation in a Guild file.
WARNING: Skipping potential source code file /home/lucas/Killme/guild-pipeline-t4-real/model/src/d2/experiments/220928-01-200SynthBacksTM-blur-CUSTOM/exp-01/inference/coco_intances_results_0001000.json because it's too big. To control which files are copied, define 'sourcecode' for the operation in a Guild file.
Resolving file:220928-01-200SynthBacksTM-blur-CUSTOM.yaml
Resolving file:../../custom/rpnt.py
Resolving file:../../custom/custom_coco_evaluation.py
Traceback (most recent call last):
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/.guild/runs/02d9f1bb04ae477cb19a954b49bd2456/.guild/sourcecode/train-mrcnn.py", line 12, in <module>
    from fvcore.nn.precise_bn import get_bn_modules
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/fvcore/nn/__init__.py", line 2, in <module>
    from .activation_count import ActivationCountAnalysis, activation_count
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/fvcore/nn/activation_count.py", line 7, in <module>
    import torch.nn as nn
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/torch/__init__.py", line 711, in <module>
    from torch import hub as hub
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/torch/hub.py", line 18, in <module>
    from tqdm.auto import tqdm  # automatically select proper tqdm submodule if available
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/tqdm/auto.py", line 29, in <module>
    from .asyncio import tqdm as asyncio_tqdm
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/tqdm/asyncio.py", line 10, in <module>
    import asyncio
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/asyncio/__init__.py", line 21, in <module>
    from .base_events import *
  File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/asyncio/base_events.py", line 296
    future = tasks.async(future, loop=self)

The curious thing, it’s that I don’t have this error when I manually run the training script …
I’m using phython 3.9.16

Any help will be appreciated! Thanks in advance!

Topic		Replies	Views
Guild run can't find module/relative import Troubleshooting	18	2281	December 13, 2021
Dependecies Problem Troubleshooting	6	923	January 22, 2021
Guild doesn't copy module to new source code location Troubleshooting	4	1054	April 26, 2021
Get Started: Create a Guild File Get Started	0	5271	June 7, 2020
WinError 1314 when trying to do a grid search Troubleshooting	3	1610	November 10, 2023

Error with async

Related topics