Hi, I’m new to guild, so I’m sorry if I’m asking something trivial…
I’m trying to run a training operation, which it’s quite simple I guess: I have a training script (training-mrcnn.py) and a couple of dependencies for tht script (a couple of python classes and functions).
All the hyperparameters for configuration are managed by detectron2 (v0.6) and my script for now. So far, I have this guidl.yaml:
train:
description: Sample training script
main: train-mrcnn
output-scalars: off
requires:
- file: 220928-01-200SynthBacksTM-blur-CUSTOM.yaml
- file: ../../custom/rpnt.py
- file: ../../custom/custom_coco_evaluation.py
But it seems it doesn’t even start, because of an error with the async library (neede by detectron2 → torch). Here’s the stack trace:
guild run train
Refreshing flags...
WARNING: cannot import flags from train-mrcnn.py: ModuleNotFoundError: No module named 'custom' (run with guild --debug for details)
You are about to run train
Continue? (Y/n)
WARNING: Skipping potential source code file /home/lucas/Killme/guild-pipeline-t4-real/model/src/d2/experiments/220928-01-200SynthBacksTM-blur-CUSTOM/exp-01/inference/coco_intances_results_0000499.json because it's too big. To control which files are copied, define 'sourcecode' for the operation in a Guild file.
WARNING: Skipping potential source code file /home/lucas/Killme/guild-pipeline-t4-real/model/src/d2/experiments/220928-01-200SynthBacksTM-blur-CUSTOM/exp-01/inference/coco_intances_results_0001000.json because it's too big. To control which files are copied, define 'sourcecode' for the operation in a Guild file.
Resolving file:220928-01-200SynthBacksTM-blur-CUSTOM.yaml
Resolving file:../../custom/rpnt.py
Resolving file:../../custom/custom_coco_evaluation.py
Traceback (most recent call last):
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/.guild/runs/02d9f1bb04ae477cb19a954b49bd2456/.guild/sourcecode/train-mrcnn.py", line 12, in <module>
from fvcore.nn.precise_bn import get_bn_modules
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/fvcore/nn/__init__.py", line 2, in <module>
from .activation_count import ActivationCountAnalysis, activation_count
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/fvcore/nn/activation_count.py", line 7, in <module>
import torch.nn as nn
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/torch/__init__.py", line 711, in <module>
from torch import hub as hub
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/torch/hub.py", line 18, in <module>
from tqdm.auto import tqdm # automatically select proper tqdm submodule if available
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/tqdm/auto.py", line 29, in <module>
from .asyncio import tqdm as asyncio_tqdm
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/tqdm/asyncio.py", line 10, in <module>
import asyncio
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/asyncio/__init__.py", line 21, in <module>
from .base_events import *
File "/media/userFiles/00.bin/pyEnv/p39d06-GPU/lib/python3.9/site-packages/asyncio/base_events.py", line 296
future = tasks.async(future, loop=self)
The curious thing, it’s that I don’t have this error when I manually run the training script …
I’m using phython 3.9.16
Any help will be appreciated! Thanks in advance!