YoloV5 在第一个 epoch 被杀死

Question

YoloV5 在第一个 epoch 被杀死

Pet*_*ter 2 python docker pytorch yolov5 wandb

我在 Windows 10 上使用具有以下配置的虚拟机：

\n

Memory 7.8 GiB\nProcessor Intel\xc2\xae Core\xe2\x84\xa2 i5-6600K CPU @ 3.50GHz \xc3\x97 3\nGraphics llvmpipe (LLVM 11.0.0, 256 bits)\nDisk Capcity 80.5 GB\nOS Ubuntu 20.10 64 Bit\nVirtualization Oracle\n

Run Code Online (Sandbox Code Playgroud)\n

我按照官方文档中的描述为Ubuntu安装了docker 。\n我按照docker 的 yolo github 部分
所述提取了 docker 映像。\n由于我没有 NVIDIA GPU，因此无法安装驱动程序或 CUDA。\n我从roboflow中提取了水族馆并将其安装在折叠水族馆上。\n我运行此命令来启动图像并安装了我的水族馆文件夹

\n

sudo docker run --ipc=host -it -v "$(pwd)"/Desktop/yolo/aquarium:/usr/src/app/aquarium ultralytics/yolov5:latest\n

Run Code Online (Sandbox Code Playgroud)\n

并受到了这个横幅的欢迎

\n

\n
=============\n== PyTorch ==
\n
NVIDIA 版本 21.03（内部版本 21060478）PyTorch 版本 1.9.0a0+df837d0
\n
容器映像版权所有 (c) 2021，NVIDIA CORPORATION。保留所有权利。
\n
版权所有 (c) 2014-2021 Facebook Inc. 版权所有 (c) 2011-2014 Idiap\n研究所 (Ronan Collobert) 版权所有 (c) 2012-2014 Deepmind\nTechnologies (Koray Kavukcuoglu) 版权所有 (c) 2011-2012 NEC\nLaboratories America (Koray Kavukcuoglu) 版权所有 (c) 2011-2013 NYU
\n(Clement Farabet) 版权所有 (c) 2006-2010 NEC Laboratories America\n(Ronan Collobert、Leon Bottou、Iain Melvin、Jason Weston) 版权所有\n(c) 2006 Idiap 研究所 (Samy Bengio) 版权所有 (c)\n2001-2004 Idiap 研究所 (Ronan Collobert、Samy Bengio、\nJohnny Mariethoz) 版权所有 (c) 2015 Google Inc. 版权所有 (c)\n2015 贾扬清版权所有 (c) 2013 -2016 Caffe 贡献者\n保留所有权利。
\n
NVIDIA 深度学习分析器 (dlprof) 版权所有 (c) 2021，NVIDIA\nCORPORATION。版权所有。
\n
各种文件包括修改 (c) NVIDIA CORPORATION。保留所有\n权利。
\n
此容器映像及其内容受 NVIDIA Deep\nLearning 容器许可证的管理。通过提取和使用容器，您\n接受此许可证的条款和条件：\n https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
\n
警告：未检测到 NVIDIA 驱动程序。GPU 功能将\n不可用。使用 \'nvidia-docker run\' 启动此容器；\n请参阅 https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker。
\n
注意: 未检测到用于多节点通信的 MOFED 驱动程序。\n多节点通信性能可能会降低。
\n

\n

所以那里没有错误。
\n我安装了 pip 并使用 pip wandb 添加了 wandb。我使用wandb login并设置了我的 API 密钥。

\n我运行了以下命令：

\n

# python train.py --img 640 --batch 16 --epochs 10 --data ./aquarium/data.yaml --weights yolov5s.pt --project ip5 --name aquarium5 --nosave --cache\n

Run Code Online (Sandbox Code Playgroud)\n

并收到以下输出：

\n

github: skipping check (Docker image)\nYOLOv5  v5.0-14-g238583b torch 1.9.0a0+df837d0 CPU\n\nNamespace(adam=False, artifact_alias=\'latest\', batch_size=16, bbox_interval=-1, bucket=\'\', cache_images=True, cfg=\'\', data=\'./aquarium/data.yaml\', device=\'\', entity=None, epochs=10, evolve=False, exist_ok=False, global_rank=-1, hyp=\'data/hyp.scratch.yaml\', image_weights=False, img_size=[640, 640], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name=\'aquarium5\', noautoanchor=False, nosave=True, notest=False, project=\'ip5\', quad=False, rect=False, resume=False, save_dir=\'ip5/aquarium5\', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=16, upload_dataset=False, weights=\'yolov5s.pt\', workers=8, world_size=1)\ntensorboard: Start with \'tensorboard --logdir ip5\', view at http://localhost:6006/\nhyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0\nwandb: Currently logged in as: pebs (use `wandb login --relogin` to force relogin)\nwandb: Tracking run with wandb version 0.10.26\nwandb: Syncing run aquarium5\nwandb: \xe2\xad\x90\xef\xb8\x8f View project at https://wandb.ai/pebs/ip5\nwandb:  View run at https://wandb.ai/pebs/ip5/runs/1c2j80ii\nwandb: Run data is saved locally in /usr/src/app/wandb/run-20210419_102642-1c2j80ii\nwandb: Run `wandb offline` to turn off syncing.\n\nOverriding model.yaml nc=80 with nc=7\n\n                 from  n    params  module                                  arguments                     \n  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    \n  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                \n  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   \n  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               \n  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 \n  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              \n  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 \n  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              \n  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        \n  9                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          \n 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              \n 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, \'nearest\']          \n 12           [-1, 6]  1         0  models.common.Concat                    [1]                           \n 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          \n 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              \n 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, \'nearest\']          \n 16           [-1, 4]  1         0  models.common.Concat                    [1]                           \n 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          \n 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              \n 19          [-1, 14]  1         0  models.common.Concat                    [1]                           \n 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          \n 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              \n 22          [-1, 10]  1         0  models.common.Concat                    [1]                           \n 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          \n 24      [17, 20, 23]  1     32364  models.yolo.Detect                      [7, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]\n[W NNPACK.cpp:80] Could not initialize NNPACK! Reason: Unsupported hardware.\nModel Summary: 283 layers, 7079724 parameters, 7079724 gradients, 16.4 GFLOPS\n\nTransferred 356/362 items from yolov5s.pt\nScaled weight_decay = 0.0005\nOptimizer groups: 62 .bias, 62 conv.weight, 59 other\ntrain: Scanning \'/usr/src/app/aquarium/train/labels.cache\' images and labels... 448 found, 0 missing, 1 empty, 0 corrupted: 100%|\xe2\x96\x88| 448/448 [00:00<?, ?\ntrain: Caching images (0.4GB): 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 448/448 [00:01<00:00, 313.77it/s]\nval: Scanning \'/usr/src/app/aquarium/valid/labels.cache\' images and labels... 127 found, 0 missing, 0 empty, 0 corrupted: 100%|\xe2\x96\x88| 127/127 [00:00<?, ?it\nval: Caching images (0.1GB): 100%|\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88\xe2\x96\x88| 127/127 [00:00<00:00, 141.31it/s]\nPlotting labels... \n\nautoanchor: Analyzing anchors... anchors/target = 5.17, Best Possible Recall (BPR) = 0.9997\nImage sizes 640 train, 640 test\nUsing 3 dataloader workers\nLogging results to ip5/aquarium5\nStarting training for 10 epochs...\n\n     Epoch   gpu_mem       box       obj       cls     total    labels  img_size\n  0%|                                                                                                                           | 0/28 [00:00<?, ?it/s]Killed\nroot@cf40a6498016:~# /opt/conda/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 6 leaked semaphore objects to clean up at shutdown\n  warnings.warn(\'resource_tracker: There appear to be %d \'\n

Run Code Online (Sandbox Code Playgroud)\n

从这个输出我认为已经完成了 0 个纪元。
\n我的 data.yaml 包含以下代码：

\n

train: /usr/src/app/aquarium/train/images\nval: /usr/src/app/aquarium/valid/images\n\nnc: 7\nnames: [\'fish\', \'jellyfish\', \'penguin\', \'puffin\', \'shark\', \'starfish\', \'stingray\']\n

Run Code Online (Sandbox Code Playgroud)\n

wandb.ai不显示任何指标，但我有文件 config.yaml、requirements.txt、wandb-metadata.json 和 wandb-summary.json。

\n

为什么我没有得到任何输出？
\n实际上根本就没有接受过培训吗？
\n如果有培训，我如何使用我的模型？

\n

Answer 1

Pet*_*ter 6

问题是虚拟机内存不足。解决方案是创建 16 GB 交换内存，以便计算机可以使用虚拟硬盘作为 RAM。

归档时间：	4 年，8 月前
查看次数：	4403 次
最近记录：	3 年，4 月前