Hi, I’m new to guild, so I’m sorry if I’m asking something trivial…
I’m trying to run a training operation, which it’s quite simple I guess: I have a training script (training-mrcnn.py) and a couple of dependencies for tht script (a couple of python classes and functions).
All the hyperparameters for configuration are managed by detectron2 (v0.6) and my script for now. So far, I have this guidl.yaml:
train: description: Sample training script main: train-mrcnn output-scalars: off requires: - file: 220928-01-200SynthBacksTM-blur-CUSTOM.yaml - file: ../../custom/rpnt.py - file: ../../custom/custom_coco_evaluation.py
But it seems it doesn’t even start, because of an error with the async library (neede by detectron2 → torch). Here’s the stack trace:
guild run train Refreshing flags... WARNING: cannot import flags from train-mrcnn.py: ModuleNotFoundError: No module named 'custom' (run with guild --debug for details) You are about to run train Continue? (Y/n) WARNING: Skipping potential source code file /home/lucas/Killme/guild-pipeline-t4-real/model/src/d2/experiments/220928-01-200SynthBacksTM-blur-CUSTOM/exp-01/inference/coco_intances_results_0000499.json because it's too big. To control which files are copied, define 'sourcecode' for the operation in a Guild file. WARNING: Skipping potential source code file /home/lucas/Killme/guild-pipeline-t4-real/model/src/d2/experiments/220928-01-200SynthBacksTM-blur-CUSTOM/exp-01/inference/coco_intances_results_0001000.json because it's too big. To control which files are copied, define 'sourcecode' for the operation in a Guild file. Resolving file:220928-01-200SynthBacksTM-blur-CUSTOM.yaml Resolving file:../../custom/rpnt.py Resolving file:../../custom/custom_coco_evaluation.py Traceback (most recent call last): File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/.guild/runs/02d9f1bb04ae477cb19a954b49bd2456/.guild/sourcecode/train-mrcnn.py", line 12, in <module> from fvcore.nn.precise_bn import get_bn_modules File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/fvcore/nn/__init__.py", line 2, in <module> from .activation_count import ActivationCountAnalysis, activation_count File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/fvcore/nn/activation_count.py", line 7, in <module> import torch.nn as nn File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/torch/__init__.py", line 711, in <module> from torch import hub as hub File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/torch/hub.py", line 18, in <module> from tqdm.auto import tqdm # automatically select proper tqdm submodule if available File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/tqdm/auto.py", line 29, in <module> from .asyncio import tqdm as asyncio_tqdm File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/tqdm/asyncio.py", line 10, in <module> import asyncio File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/asyncio/__init__.py", line 21, in <module> from .base_events import * File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/asyncio/base_events.py", line 296 future = tasks.async(future, loop=self)
The curious thing, it’s that I don’t have this error when I manually run the training script …
I’m using phython 3.9.16
Any help will be appreciated! Thanks in advance!