Detectorn2预训练模型复现:数据准备、训练命令、日志分析与输出目录

本文主要是介绍Detectorn2预训练模型复现:数据准备、训练命令、日志分析与输出目录,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

Detectorn2预训练模型复现:数据准备、训练命令、日志分析与输出目录

在深度学习项目中,目标检测是一项重要的任务。本文将详细介绍如何使用Detectron2进行目标检测模型的复现训练,涵盖训练数据准备、训练命令、训练日志分析、训练指标以及训练输出目录的各个文件及其作用。特别地,我们将演示在训练过程中出现中断后,如何使用 resume 功能继续训练,并将我们复现的模型与Model Zoo中的模型进行比较。

一、训练数据准备

COCO(Common Objects in Context)数据集是一个广泛使用的图像识别、目标检测和分割数据集。我们将使用COCO数据集进行模型训练和评估。以下是COCO数据集的目录结构:

/mnt/coco
├── annotations
├── annotations_trainval2014.zip
├── annotations_trainval2017.zip
├── test2014
├── test2014.zip
├── test2017
├── test2017.zip
├── train2014
├── train2014.zip
├── train2017
├── train2017.zip
├── val2014
├── val2014.zip
└── val2017└── val2017.zip

目录和文件解释

  1. annotations/:存放COCO数据集的注释文件,这些文件通常是JSON格式,包含了图像的标签、边界框、分割掩码等信息。
  2. annotations_trainval2014.zipannotations_trainval2017.zip:COCO 2014和2017训练和验证集的注释文件压缩包。
  3. test2014/test2017/:存放COCO 2014和2017测试集的图像文件,用于模型测试。
  4. test2014.ziptest2017.zip:COCO 2014和2017测试集的图像文件压缩包。
  5. train2014/train2017/:存放COCO 2014和2017训练集的图像文件,用于模型训练。
  6. train2014.ziptrain2017.zip:COCO 2014和2017训练集的图像文件压缩包。
  7. val2014/val2017/:存放COCO 2014和2017验证集的图像文件,用于模型验证。
  8. val2014.zipval2017.zip:COCO 2014和2017验证集的图像文件压缩包。

二、训练命令

在开始训练之前,需要设置环境变量来指定数据集的路径:

export DETECTRON2_DATASETS=/mnt/

第一个训练命令

nohup ./train_net.py --config-file ../configs/COCO-Detection/faster_rcnn_R_50_FPN_1x.yaml --num-gpus 8 OUTPUT_DIR /mnt/output/ > train.log 2>&1 &
  1. nohup:使命令在后台运行,即使关闭终端也不会中断。
  2. ./train_net.py:训练脚本,负责启动训练过程。
  3. –config-file:指定配置文件路径。
  4. –num-gpus 8:使用8个GPU进行训练。
  5. OUTPUT_DIR /mnt/output/:指定输出目录。
  6. > train.log 2>&1:将标准输出和错误输出重定向到 train.log
  7. &:将命令放到后台运行。

第二个训练命令(使用resume功能)

在训练过程中出现中断后,我们可以使用 resume 功能继续训练:

nohup ./train_net.py --config-file ../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml --num-gpus 8 --resume OUTPUT_DIR /mnt/output/ MODEL.WEIGHTS /mnt/output/model_0029999.pth > train.log 2>&1 &
  1. –config-file:指定配置文件路径。
  2. –resume:从上一次中断的地方继续训练。
  3. MODEL.WEIGHTS:指定预训练模型的权重文件路径。

三、训练日志分析

nohup: ignoring input
Command Line Args: Namespace(config_file='../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=['OUTPUT_DIR', '/mnt/output/', 'MODEL.WEIGHTS', '/mnt/output/model_0029999.pth'], resume=True)
[09/06 02:16:26 detectron2]: Rank of current process: 0. World size: 8
[09/06 02:16:30 detectron2]: Environment info:
-------------------------------  --------------------------------------------------------------
sys.platform                     linux
Python                           3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
numpy                            1.22.2
detectron2                       0.6 @/root/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 12.0
detectron2 arch flags            5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6, 9.0
DETECTRON2_ENV_MODULE            <not set>
PyTorch                          1.14.0a0+44dac51 @/usr/local/lib/python3.8/dist-packages/torch
PyTorch debug build              False
torch._C._GLIBCXX_USE_CXX11_ABI  True
GPU available                    Yes
GPU 0,1,2,3,4,5,6,7              Tesla V100-SXM2-16GB (arch=7.0)
Driver version                   535.161.08
CUDA_HOME                        /usr/local/cuda
Pillow                           9.2.0
torchvision                      0.15.0a0 @/usr/local/lib/python3.8/dist-packages/torchvision
torchvision arch flags           5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6, 9.0
fvcore                           0.1.5.post20221221
iopath                           0.1.9
cv2                              4.6.0
-------------------------------  --------------------------------------------------------------
PyTorch built with:- GCC 9.4- C++ Version: 201402- Intel(R) oneAPI Math Kernel Library Version 2021.1-Product Build 20201104 for Intel(R) 64 architecture applications- Intel(R) MKL-DNN v2.7.0 (Git Hash N/A)- OpenMP 201511 (a.k.a. OpenMP 4.5)- LAPACK is enabled (usually provided by MKL)- NNPACK is enabled- CPU capability usage: NO AVX- CUDA Runtime 12.0- NVCC architecture flags: -gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_90,code=compute_90- CuDNN 8.7  (built against CUDA 11.8)- Magma 2.6.2- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.0, CUDNN_VERSION=8.7.0, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS=-fno-gnu-unique -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=1.14.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=ON, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, [09/06 02:16:30 detectron2]: Command line arguments: Namespace(config_file='../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=['OUTPUT_DIR', '/mnt/output/', 'MODEL.WEIGHTS', '/mnt/output/model_0029999.pth'], resume=True)
[09/06 02:16:30 detectron2]: Contents of args.config_file=../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml:
_BASE_: "../Base-RCNN-FPN.yaml"
MODEL:WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"MASK_ON: FalseRESNETS:DEPTH: 50
SOLVER:STEPS: (210000, 250000)MAX_ITER: 270000[09/06 02:16:30 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:ASPECT_RATIO_GROUPING: trueFILTER_EMPTY_ANNOTATIONS: trueNUM_WORKERS: 4REPEAT_SQRT: trueREPEAT_THRESHOLD: 0.0SAMPLER_TRAIN: TrainingSampler
DATASETS:PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000PROPOSAL_FILES_TEST: []PROPOSAL_FILES_TRAIN: []TEST:- coco_2017_valTRAIN:- coco_2017_train
GLOBAL:HACK: 1.0
INPUT:CROP:ENABLED: falseSIZE:- 0.9- 0.9TYPE: relative_rangeFORMAT: BGRMASK_FORMAT: polygonMAX_SIZE_TEST: 1333MAX_SIZE_TRAIN: 1333MIN_SIZE_TEST: 800MIN_SIZE_TRAIN:- 640- 672- 704- 736- 768- 800MIN_SIZE_TRAIN_SAMPLING: choiceRANDOM_FLIP: horizontal
MODEL:ANCHOR_GENERATOR:ANGLES:- - -90- 0- 90ASPECT_RATIOS:- - 0.5- 1.0- 2.0NAME: DefaultAnchorGeneratorOFFSET: 0.0SIZES:- - 32- - 64- - 128- - 256- - 512BACKBONE:FREEZE_AT: 2NAME: build_resnet_fpn_backboneDEVICE: cudaFPN:FUSE_TYPE: sumIN_FEATURES:- res2- res3- res4- res5NORM: ''OUT_CHANNELS: 256KEYPOINT_ON: falseLOAD_PROPOSALS: falseMASK_ON: falseMETA_ARCHITECTURE: GeneralizedRCNNPANOPTIC_FPN:COMBINE:ENABLED: trueINSTANCES_CONFIDENCE_THRESH: 0.5OVERLAP_THRESH: 0.5STUFF_AREA_LIMIT: 4096INSTANCE_LOSS_WEIGHT: 1.0PIXEL_MEAN:- 103.53- 116.28- 123.675PIXEL_STD:- 1.0- 1.0- 1.0PROPOSAL_GENERATOR:MIN_SIZE: 0NAME: RPNRESNETS:DEFORM_MODULATED: falseDEFORM_NUM_GROUPS: 1DEFORM_ON_PER_STAGE:- false- false- false- falseDEPTH: 50NORM: FrozenBNNUM_GROUPS: 1OUT_FEATURES:- res2- res3- res4- res5RES2_OUT_CHANNELS: 256RES5_DILATION: 1STEM_OUT_CHANNELS: 64STRIDE_IN_1X1: trueWIDTH_PER_GROUP: 64RETINANET:BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_WEIGHTS: &id002- 1.0- 1.0- 1.0- 1.0FOCAL_LOSS_ALPHA: 0.25FOCAL_LOSS_GAMMA: 2.0IN_FEATURES:- p3- p4- p5- p6- p7IOU_LABELS:- 0- -1- 1IOU_THRESHOLDS:- 0.4- 0.5NMS_THRESH_TEST: 0.5NORM: ''NUM_CLASSES: 80NUM_CONVS: 4PRIOR_PROB: 0.01SCORE_THRESH_TEST: 0.05SMOOTH_L1_LOSS_BETA: 0.1TOPK_CANDIDATES_TEST: 1000ROI_BOX_CASCADE_HEAD:BBOX_REG_WEIGHTS:- &id001- 10.0- 10.0- 5.0- 5.0- - 20.0- 20.0- 10.0- 10.0- - 30.0- 30.0- 15.0- 15.0IOUS:- 0.5- 0.6- 0.7ROI_BOX_HEAD:BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_LOSS_WEIGHT: 1.0BBOX_REG_WEIGHTS: *id001CLS_AGNOSTIC_BBOX_REG: falseCONV_DIM: 256FC_DIM: 1024FED_LOSS_FREQ_WEIGHT_POWER: 0.5FED_LOSS_NUM_CLASSES: 50NAME: FastRCNNConvFCHeadNORM: ''NUM_CONV: 0NUM_FC: 2POOLER_RESOLUTION: 7POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2SMOOTH_L1_BETA: 0.0TRAIN_ON_PRED_BOXES: falseUSE_FED_LOSS: falseUSE_SIGMOID_CE: falseROI_HEADS:BATCH_SIZE_PER_IMAGE: 512IN_FEATURES:- p2- p3- p4- p5IOU_LABELS:- 0- 1IOU_THRESHOLDS:- 0.5NAME: StandardROIHeadsNMS_THRESH_TEST: 0.5NUM_CLASSES: 80POSITIVE_FRACTION: 0.25PROPOSAL_APPEND_GT: trueSCORE_THRESH_TEST: 0.05ROI_KEYPOINT_HEAD:CONV_DIMS:- 512- 512- 512- 512- 512- 512- 512- 512LOSS_WEIGHT: 1.0MIN_KEYPOINTS_PER_IMAGE: 1NAME: KRCNNConvDeconvUpsampleHeadNORMALIZE_LOSS_BY_VISIBLE_KEYPOINTS: trueNUM_KEYPOINTS: 17POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2ROI_MASK_HEAD:CLS_AGNOSTIC_MASK: falseCONV_DIM: 256NAME: MaskRCNNConvUpsampleHeadNORM: ''NUM_CONV: 4POOLER_RESOLUTION: 14POOLER_SAMPLING_RATIO: 0POOLER_TYPE: ROIAlignV2RPN:BATCH_SIZE_PER_IMAGE: 256BBOX_REG_LOSS_TYPE: smooth_l1BBOX_REG_LOSS_WEIGHT: 1.0BBOX_REG_WEIGHTS: *id002BOUNDARY_THRESH: -1CONV_DIMS:- -1HEAD_NAME: StandardRPNHeadIN_FEATURES:- p2- p3- p4- p5- p6IOU_LABELS:- 0- -1- 1IOU_THRESHOLDS:- 0.3- 0.7LOSS_WEIGHT: 1.0NMS_THRESH: 0.7POSITIVE_FRACTION: 0.5POST_NMS_TOPK_TEST: 1000POST_NMS_TOPK_TRAIN: 1000PRE_NMS_TOPK_TEST: 1000PRE_NMS_TOPK_TRAIN: 2000SMOOTH_L1_BETA: 0.0SEM_SEG_HEAD:COMMON_STRIDE: 4CONVS_DIM: 128IGNORE_VALUE: 255IN_FEATURES:- p2- p3- p4- p5LOSS_WEIGHT: 1.0NAME: SemSegFPNHeadNORM: GNNUM_CLASSES: 54WEIGHTS: /mnt/output/model_0029999.pth
OUTPUT_DIR: /mnt/output/
SEED: -1
SOLVER:AMP:ENABLED: falseBASE_LR: 0.02BASE_LR_END: 0.0BIAS_LR_FACTOR: 1.0CHECKPOINT_PERIOD: 5000CLIP_GRADIENTS:CLIP_TYPE: valueCLIP_VALUE: 1.0ENABLED: falseNORM_TYPE: 2.0GAMMA: 0.1IMS_PER_BATCH: 16LR_SCHEDULER_NAME: WarmupMultiStepLRMAX_ITER: 270000MOMENTUM: 0.9NESTEROV: falseNUM_DECAYS: 3REFERENCE_WORLD_SIZE: 0RESCALE_INTERVAL: falseSTEPS:- 210000- 250000WARMUP_FACTOR: 0.001WARMUP_ITERS: 1000WARMUP_METHOD: linearWEIGHT_DECAY: 0.0001WEIGHT_DECAY_BIAS: nullWEIGHT_DECAY_NORM: 0.0
TEST:AUG:ENABLED: falseFLIP: trueMAX_SIZE: 4000MIN_SIZES:- 400- 500- 600- 700- 800- 900- 1000- 1100- 1200DETECTIONS_PER_IMAGE: 100EVAL_PERIOD: 0EXPECTED_RESULTS: []KEYPOINT_OKS_SIGMAS: []PRECISE_BN:ENABLED: falseNUM_ITER: 200
VERSION: 2
VIS_PERIOD: 0[09/06 02:16:30 detectron2]: Full config saved to /mnt/output/config.yaml
[09/06 02:16:30 d2.utils.env]: Using a generated random seed 30790687
[09/06 02:16:32 d2.engine.defaults]: Model:
GeneralizedRCNN((backbone): FPN((fpn_lateral2): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))(fpn_output2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(fpn_lateral3): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))(fpn_output3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(fpn_lateral4): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))(fpn_output4): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(fpn_lateral5): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))(fpn_output5): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))(top_block): LastLevelMaxPool()(bottom_up): ResNet((stem): BasicStem((conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)))(res2): Sequential((0): BottleneckBlock((shortcut): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05))(conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))))(res3): Sequential((0): BottleneckBlock((shortcut): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)))(3): BottleneckBlock((conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05))(conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))))(res4): Sequential((0): BottleneckBlock((shortcut): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05))(conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(3): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(4): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)))(5): BottleneckBlock((conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05))(conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05))))(res5): Sequential((0): BottleneckBlock((shortcut): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05))(conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)))(1): BottleneckBlock((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)))(2): BottleneckBlock((conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05))(conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05))))))(proposal_generator): RPN((rpn_head): StandardRPNHead((conv): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)(activation): ReLU())(objectness_logits): Conv2d(256, 3, kernel_size=(1, 1), stride=(1, 1))(anchor_deltas): Conv2d(256, 12, kernel_size=(1, 1), stride=(1, 1)))(anchor_generator): DefaultAnchorGenerator((cell_anchors): BufferList()))(roi_heads): StandardROIHeads((box_pooler): ROIPooler((level_poolers): ModuleList((0): ROIAlign(output_size=(7, 7), spatial_scale=0.25, sampling_ratio=0, aligned=True)(1): ROIAlign(output_size=(7, 7), spatial_scale=0.125, sampling_ratio=0, aligned=True)(2): ROIAlign(output_size=(7, 7), spatial_scale=0.0625, sampling_ratio=0, aligned=True)(3): ROIAlign(output_size=(7, 7), spatial_scale=0.03125, sampling_ratio=0, aligned=True)))(box_head): FastRCNNConvFCHead((flatten): Flatten(start_dim=1, end_dim=-1)(fc1): Linear(in_features=12544, out_features=1024, bias=True)(fc_relu1): ReLU()(fc2): Linear(in_features=1024, out_features=1024, bias=True)(fc_relu2): ReLU())(box_predictor): FastRCNNOutputLayers((cls_score): Linear(in_features=1024, out_features=81, bias=True)(bbox_pred): Linear(in_features=1024, out_features=320, bias=True)))
)
[09/06 02:16:53 d2.data.datasets.coco]: Loading /mnt/coco/annotations/instances_train2017.json takes 21.55 seconds.
[09/06 02:16:55 d2.data.datasets.coco]: Loaded 118287 images in COCO format from /mnt/coco/annotations/instances_train2017.json
[09/06 02:17:05 d2.data.build]: Removed 1021 images with no usable annotations. 117266 images left.
[09/06 02:17:10 d2.data.build]: Distribution of instances among all 80 categories:
|   category    | #instances   |   category   | #instances   |   category    | #instances   |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
|    person     | 257253       |   bicycle    | 7056         |      car      | 43533        |
|  motorcycle   | 8654         |   airplane   | 5129         |      bus      | 6061         |
|     train     | 4570         |    truck     | 9970         |     boat      | 10576        |
| traffic light | 12842        | fire hydrant | 1865         |   stop sign   | 1983         |
| parking meter | 1283         |    bench     | 9820         |     bird      | 10542        |
|      cat      | 4766         |     dog      | 5500         |     horse     | 6567         |
|     sheep     | 9223         |     cow      | 8014         |   elephant    | 5484         |
|     bear      | 1294         |    zebra     | 5269         |    giraffe    | 5128         |
|   backpack    | 8714         |   umbrella   | 11265        |    handbag    | 12342        |
|      tie      | 6448         |   suitcase   | 6112         |    frisbee    | 2681         |
|     skis      | 6623         |  snowboard   | 2681         |  sports ball  | 6299         |
|     kite      | 8802         | baseball bat | 3273         | baseball gl.. | 3747         |
|  skateboard   | 5536         |  surfboard   | 6095         | tennis racket | 4807         |
|    bottle     | 24070        |  wine glass  | 7839         |      cup      | 20574        |
|     fork      | 5474         |    knife     | 7760         |     spoon     | 6159         |
|     bowl      | 14323        |    banana    | 9195         |     apple     | 5776         |
|   sandwich    | 4356         |    orange    | 6302         |   broccoli    | 7261         |
|    carrot     | 7758         |   hot dog    | 2884         |     pizza     | 5807         |
|     donut     | 7005         |     cake     | 6296         |     chair     | 38073        |
|     couch     | 5779         | potted plant | 8631         |      bed      | 4192         |
| dining table  | 15695        |    toilet    | 4149         |      tv       | 5803         |
|    laptop     | 4960         |    mouse     | 2261         |    remote     | 5700         |
|   keyboard    | 2854         |  cell phone  | 6422         |   microwave   | 1672         |
|     oven      | 3334         |   toaster    | 225          |     sink      | 5609         |
| refrigerator  | 2634         |     book     | 24077        |     clock     | 6320         |
|     vase      | 6577         |   scissors   | 1464         |  teddy bear   | 4729         |
|  hair drier   | 198          |  toothbrush  | 1945         |               |              |
|     total     | 849949       |              |              |               |              |
[09/06 02:17:10 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800), max_size=1333, sample_style='choice'), RandomFlip()]
[09/06 02:17:10 d2.data.build]: Using training sampler TrainingSampler
[09/06 02:17:11 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[09/06 02:17:11 d2.data.common]: Serializing 117266 elements to byte tensors and concatenating them all ...
[09/06 02:17:16 d2.data.common]: Serialized dataset takes 450.77 MiB
[09/06 02:17:16 d2.data.build]: Making batched data loader with batch_size=2
[09/06 02:17:19 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /mnt/output/model_0029999.pth ...
[09/06 02:17:19 fvcore.common.checkpoint]: [Checkpointer] Loading from /mnt/output/model_0029999.pth ...
[09/06 02:17:19 fvcore.common.checkpoint]: Loading trainer from /mnt/output/model_0029999.pth ...
[09/06 02:17:19 d2.engine.hooks]: Loading scheduler from state_dict ...
[09/06 02:17:20 d2.engine.train_loop]: Starting training from iteration 30000
[09/06 02:17:35 d2.utils.events]:  eta: 13:29:58  iter: 30019  total_loss: 0.6783  loss_cls: 0.2512  loss_box_reg: 0.2507  loss_rpn_cls: 0.05119  loss_rpn_loc: 0.08998    time: 0.2031  last_time: 0.2046  data_time: 0.4823  last_data_time: 0.0075   lr: 0.02  max_mem: 2903M
[09/06 02:17:39 d2.utils.events]:  eta: 13:29:54  iter: 30039  total_loss: 0.7224  loss_cls: 0.2514  loss_box_reg: 0.2828  loss_rpn_cls: 0.05798  loss_rpn_loc: 0.09668    time: 0.2028  last_time: 0.2075  data_time: 0.0088  last_data_time: 0.0105   lr: 0.02  max_mem: 2903M
[09/06 02:17:43 d2.utils.events]:  eta: 13:22:12  iter: 30059  total_loss: 0.6778  loss_cls: 0.2609  loss_box_reg: 0.28  loss_rpn_cls: 0.0531  loss_rpn_loc: 0.08019    time: 0.2015  last_time: 0.1991  data_time: 0.0087  last_data_time: 0.0075   lr: 0.02  max_mem: 2903M
[09/06 02:17:47 d2.utils.events]:  eta: 13:20:22  iter: 30079  total_loss: 0.6472  loss_cls: 0.2415  loss_box_reg: 0.2495  loss_rpn_cls: 0.04993  loss_rpn_loc: 0.09604    time: 0.2003  last_time: 0.1824  data_time: 0.0092  last_data_time: 0.0114   lr: 0.02  max_mem: 2903M
[09/06 02:17:51 d2.utils.events]:  eta: 13:20:18  iter: 30099  total_loss: 0.6032  loss_cls: 0.2444  loss_box_reg: 0.2445  loss_rpn_cls: 0.05288  loss_rpn_loc: 0.0732    time: 0.2008  last_time: 0.2081  data_time: 0.0090  last_data_time: 0.0170   lr: 0.02  max_mem: 2904M
[09/06 02:17:55 d2.utils.events]:  eta: 13:20:14  iter: 30119  total_loss: 0.5806  loss_cls: 0.2233  loss_box_reg: 0.2357  loss_rpn_cls: 0.04176  loss_rpn_loc: 0.07658    time: 0.2006  last_time: 0.2017  data_time: 0.0080  last_data_time: 0.0070   lr: 0.02  max_mem: 2904M
[09/06 15:45:25 d2.utils.events]:  eta: 0:01:19  iter: 269599  total_loss: 0.4819  loss_cls: 0.1778  loss_box_reg: 0.221  loss_rpn_cls: 0.02964  loss_rpn_loc: 0.05748    time: 0.1988  last_time: 0.2063  data_time: 0.0083  last_data_time: 0.0076   lr: 0.0002  max_mem: 2904M
[09/06 15:45:29 d2.utils.events]:  eta: 0:01:15  iter: 269619  total_loss: 0.4636  loss_cls: 0.1539  loss_box_reg: 0.2098  loss_rpn_cls: 0.02256  loss_rpn_loc: 0.06406    time: 0.1988  last_time: 0.2030  data_time: 0.0086  last_data_time: 0.0077   lr: 0.0002  max_mem: 2904M
[09/06 15:45:33 d2.utils.events]:  eta: 0:01:11  iter: 269639  total_loss: 0.5086  loss_cls: 0.1783  loss_box_reg: 0.2321  loss_rpn_cls: 0.02421  loss_rpn_loc: 0.06799    time: 0.1988  last_time: 0.1933  data_time: 0.0089  last_data_time: 0.0124   lr: 0.0002  max_mem: 2904M
[09/06 15:45:37 d2.utils.events]:  eta: 0:01:07  iter: 269659  total_loss: 0.4706  loss_cls: 0.1592  loss_box_reg: 0.2124  loss_rpn_cls: 0.02371  loss_rpn_loc: 0.06897    time: 0.1988  last_time: 0.2003  data_time: 0.0083  last_data_time: 0.0089   lr: 0.0002  max_mem: 2904M
[09/06 15:45:41 d2.utils.events]:  eta: 0:01:03  iter: 269679  total_loss: 0.4713  loss_cls: 0.1709  loss_box_reg: 0.2129  loss_rpn_cls: 0.02803  loss_rpn_loc: 0.06166    time: 0.1988  last_time: 0.1977  data_time: 0.0081  last_data_time: 0.0068   lr: 0.0002  max_mem: 2904M
[09/06 15:45:45 d2.utils.events]:  eta: 0:00:59  iter: 269699  total_loss: 0.4516  loss_cls: 0.1572  loss_box_reg: 0.2083  loss_rpn_cls: 0.02273  loss_rpn_loc: 0.06108    time: 0.1988  last_time: 0.1891  data_time: 0.0086  last_data_time: 0.0120   lr: 0.0002  max_mem: 2904M
[09/06 15:45:49 d2.utils.events]:  eta: 0:00:55  iter: 269719  total_loss: 0.4771  loss_cls: 0.1766  loss_box_reg: 0.2144  loss_rpn_cls: 0.02578  loss_rpn_loc: 0.06036    time: 0.1988  last_time: 0.1882  data_time: 0.0081  last_data_time: 0.0060   lr: 0.0002  max_mem: 2904M
[09/06 15:45:53 d2.utils.events]:  eta: 0:00:51  iter: 269739  total_loss: 0.4586  loss_cls: 0.168  loss_box_reg: 0.2119  loss_rpn_cls: 0.02173  loss_rpn_loc: 0.05767    time: 0.1988  last_time: 0.1921  data_time: 0.0091  last_data_time: 0.0063   lr: 0.0002  max_mem: 2904M
[09/06 15:45:57 d2.utils.events]:  eta: 0:00:47  iter: 269759  total_loss: 0.442  loss_cls: 0.1605  loss_box_reg: 0.2144  loss_rpn_cls: 0.02168  loss_rpn_loc: 0.05123    time: 0.1988  last_time: 0.1978  data_time: 0.0084  last_data_time: 0.0089   lr: 0.0002  max_mem: 2904M
[09/06 15:46:01 d2.utils.events]:  eta: 0:00:43  iter: 269779  total_loss: 0.4803  loss_cls: 0.1671  loss_box_reg: 0.2056  loss_rpn_cls: 0.02325  loss_rpn_loc: 0.06247    time: 0.1988  last_time: 0.1842  data_time: 0.0091  last_data_time: 0.0063   lr: 0.0002  max_mem: 2904M
[09/06 15:46:05 d2.utils.events]:  eta: 0:00:39  iter: 269799  total_loss: 0.4994  loss_cls: 0.181  loss_box_reg: 0.2173  loss_rpn_cls: 0.02877  loss_rpn_loc: 0.06976    time: 0.1988  last_time: 0.2037  data_time: 0.0082  last_data_time: 0.0078   lr: 0.0002  max_mem: 2904M
[09/06 15:46:09 d2.utils.events]:  eta: 0:00:35  iter: 269819  total_loss: 0.4605  loss_cls: 0.162  loss_box_reg: 0.2145  loss_rpn_cls: 0.02834  loss_rpn_loc: 0.06403    time: 0.1988  last_time: 0.2047  data_time: 0.0078  last_data_time: 0.0067   lr: 0.0002  max_mem: 2904M
[09/06 15:46:13 d2.utils.events]:  eta: 0:00:31  iter: 269839  total_loss: 0.5042  loss_cls: 0.1746  loss_box_reg: 0.2268  loss_rpn_cls: 0.02664  loss_rpn_loc: 0.05538    time: 0.1988  last_time: 0.2013  data_time: 0.0087  last_data_time: 0.0077   lr: 0.0002  max_mem: 2904M
[09/06 15:46:17 d2.utils.events]:  eta: 0:00:27  iter: 269859  total_loss: 0.4772  loss_cls: 0.1592  loss_box_reg: 0.2132  loss_rpn_cls: 0.02413  loss_rpn_loc: 0.05851    time: 0.1988  last_time: 0.1826  data_time: 0.0074  last_data_time: 0.0107   lr: 0.0002  max_mem: 2904M
[09/06 15:46:21 d2.utils.events]:  eta: 0:00:23  iter: 269879  total_loss: 0.4978  loss_cls: 0.1759  loss_box_reg: 0.2295  loss_rpn_cls: 0.02774  loss_rpn_loc: 0.07485    time: 0.1988  last_time: 0.2152  data_time: 0.0080  last_data_time: 0.0072   lr: 0.0002  max_mem: 2904M
[09/06 15:46:26 d2.utils.events]:  eta: 0:00:19  iter: 269899  total_loss: 0.4582  loss_cls: 0.157  loss_box_reg: 0.2078  loss_rpn_cls: 0.02078  loss_rpn_loc: 0.05431    time: 0.1988  last_time: 0.2094  data_time: 0.0076  last_data_time: 0.0074   lr: 0.0002  max_mem: 2904M
[09/06 15:46:30 d2.utils.events]:  eta: 0:00:15  iter: 269919  total_loss: 0.477  loss_cls: 0.1648  loss_box_reg: 0.2149  loss_rpn_cls: 0.02556  loss_rpn_loc: 0.06299    time: 0.1988  last_time: 0.1939  data_time: 0.0075  last_data_time: 0.0061   lr: 0.0002  max_mem: 2904M
[09/06 15:46:34 d2.utils.events]:  eta: 0:00:11  iter: 269939  total_loss: 0.4678  loss_cls: 0.1682  loss_box_reg: 0.2207  loss_rpn_cls: 0.02335  loss_rpn_loc: 0.06278    time: 0.1988  last_time: 0.1984  data_time: 0.0086  last_data_time: 0.0074   lr: 0.0002  max_mem: 2904M
[09/06 15:46:38 d2.utils.events]:  eta: 0:00:07  iter: 269959  total_loss: 0.4705  loss_cls: 0.1607  loss_box_reg: 0.2123  loss_rpn_cls: 0.02339  loss_rpn_loc: 0.06207    time: 0.1988  last_time: 0.1914  data_time: 0.0090  last_data_time: 0.0083   lr: 0.0002  max_mem: 2904M
[09/06 15:46:42 d2.utils.events]:  eta: 0:00:03  iter: 269979  total_loss: 0.4843  loss_cls: 0.168  loss_box_reg: 0.2255  loss_rpn_cls: 0.0248  loss_rpn_loc: 0.07147    time: 0.1988  last_time: 0.2150  data_time: 0.0081  last_data_time: 0.0128   lr: 0.0002  max_mem: 2904M
[09/06 15:46:46 fvcore.common.checkpoint]: Saving checkpoint to /mnt/output/model_0269999.pth
[09/06 15:46:47 fvcore.common.checkpoint]: Saving checkpoint to /mnt/output/model_final.pth
[09/06 15:46:48 d2.utils.events]:  eta: 0:00:00  iter: 269999  total_loss: 0.4217  loss_cls: 0.1577  loss_box_reg: 0.191  loss_rpn_cls: 0.02127  loss_rpn_loc: 0.0584    time: 0.1988  last_time: 0.2100  data_time: 0.0084  last_data_time: 0.0062   lr: 0.0002  max_mem: 2904M
[09/06 15:46:48 d2.engine.hooks]: Overall training speed: 239998 iterations in 13:15:06 (0.1988 s / it)
[09/06 15:46:48 d2.engine.hooks]: Total training time: 13:29:17 (0:14:10 on hooks)
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[09/06 15:46:49 d2.data.datasets.coco]: Loaded 5000 images in COCO format from /mnt/coco/annotations/instances_val2017.json
[09/06 15:46:49 d2.data.build]: Distribution of instances among all 80 categories:
|   category    | #instances   |   category   | #instances   |   category    | #instances   |
|:-------------:|:-------------|:------------:|:-------------|:-------------:|:-------------|
|    person     | 10777        |   bicycle    | 314          |      car      | 1918         |
|  motorcycle   | 367          |   airplane   | 143          |      bus      | 283          |
|     train     | 190          |    truck     | 414          |     boat      | 424          |
| traffic light | 634          | fire hydrant | 101          |   stop sign   | 75           |
| parking meter | 60           |    bench     | 411          |     bird      | 427          |
|      cat      | 202          |     dog      | 218          |     horse     | 272          |
|     sheep     | 354          |     cow      | 372          |   elephant    | 252          |
|     bear      | 71           |    zebra     | 266          |    giraffe    | 232          |
|   backpack    | 371          |   umbrella   | 407          |    handbag    | 540          |
|      tie      | 252          |   suitcase   | 299          |    frisbee    | 115          |
|     skis      | 241          |  snowboard   | 69           |  sports ball  | 260          |
|     kite      | 327          | baseball bat | 145          | baseball gl.. | 148          |
|  skateboard   | 179          |  surfboard   | 267          | tennis racket | 225          |
|    bottle     | 1013         |  wine glass  | 341          |      cup      | 895          |
|     fork      | 215          |    knife     | 325          |     spoon     | 253          |
|     bowl      | 623          |    banana    | 370          |     apple     | 236          |
|   sandwich    | 177          |    orange    | 285          |   broccoli    | 312          |
|    carrot     | 365          |   hot dog    | 125          |     pizza     | 284          |
|     donut     | 328          |     cake     | 310          |     chair     | 1771         |
|     couch     | 261          | potted plant | 342          |      bed      | 163          |
| dining table  | 695          |    toilet    | 179          |      tv       | 288          |
|    laptop     | 231          |    mouse     | 106          |    remote     | 283          |
|   keyboard    | 153          |  cell phone  | 262          |   microwave   | 55           |
|     oven      | 143          |   toaster    | 9            |     sink      | 225          |
| refrigerator  | 126          |     book     | 1129         |     clock     | 267          |
|     vase      | 274          |   scissors   | 36           |  teddy bear   | 190          |
|  hair drier   | 11           |  toothbrush  | 57           |               |              |
|     total     | 36335        |              |              |               |              |
[09/06 15:46:49 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[09/06 15:46:49 d2.data.common]: Serializing the dataset using: <class 'detectron2.data.common._TorchSerializedList'>
[09/06 15:46:49 d2.data.common]: Serializing 5000 elements to byte tensors and concatenating them all ...
[09/06 15:46:50 d2.data.common]: Serialized dataset takes 19.08 MiB
[09/06 15:46:50 d2.evaluation.evaluator]: Start inference on 625 batches
/usr/local/lib/python3.8/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3435.)return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
[09/06 15:46:54 d2.evaluation.evaluator]: Inference done 11/625. Dataloading: 0.0013 s/iter. Inference: 0.0479 s/iter. Eval: 0.0003 s/iter. Total: 0.0495 s/iter. ETA=0:00:30
[09/06 15:46:59 d2.evaluation.evaluator]: Inference done 121/625. Dataloading: 0.0018 s/iter. Inference: 0.0437 s/iter. Eval: 0.0003 s/iter. Total: 0.0459 s/iter. ETA=0:00:23
[09/06 15:47:04 d2.evaluation.evaluator]: Inference done 218/625. Dataloading: 0.0018 s/iter. Inference: 0.0463 s/iter. Eval: 0.0004 s/iter. Total: 0.0486 s/iter. ETA=0:00:19
[09/06 15:47:09 d2.evaluation.evaluator]: Inference done 327/625. Dataloading: 0.0019 s/iter. Inference: 0.0454 s/iter. Eval: 0.0004 s/iter. Total: 0.0477 s/iter. ETA=0:00:14
[09/06 15:47:14 d2.evaluation.evaluator]: Inference done 439/625. Dataloading: 0.0019 s/iter. Inference: 0.0447 s/iter. Eval: 0.0004 s/iter. Total: 0.0470 s/iter. ETA=0:00:08
[09/06 15:47:19 d2.evaluation.evaluator]: Inference done 548/625. Dataloading: 0.0018 s/iter. Inference: 0.0446 s/iter. Eval: 0.0004 s/iter. Total: 0.0468 s/iter. ETA=0:00:03
[09/06 15:47:23 d2.evaluation.evaluator]: Total inference time: 0:00:29.375996 (0.047381 s / iter per device, on 8 devices)
[09/06 15:47:23 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:27 (0.044494 s / iter per device, on 8 devices)
[09/06 15:47:25 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[09/06 15:47:25 d2.evaluation.coco_evaluation]: Saving results to /mnt/output/inference/coco_instances_results.json
[09/06 15:47:26 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.87s)
creating index...
index created!
[09/06 15:47:27 d2.evaluation.fast_eval_api]: Evaluate annotation type *bbox*
[09/06 15:47:38 d2.evaluation.fast_eval_api]: COCOeval_opt.evaluate() finished in 11.66 seconds.
[09/06 15:47:39 d2.evaluation.fast_eval_api]: Accumulating evaluation results...
[09/06 15:47:40 d2.evaluation.fast_eval_api]: COCOeval_opt.accumulate() finished in 1.11 seconds.Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.401Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.608Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.435Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.238Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.434Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.326Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.512Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.537Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.350Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.573Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.675
[09/06 15:47:40 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 40.064 | 60.844 | 43.457 | 23.807 | 43.418 | 52.071 |
[09/06 15:47:40 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 54.495 | bicycle      | 30.683 | car            | 44.135 |
| motorcycle    | 42.270 | airplane     | 63.287 | bus            | 63.137 |
| train         | 60.485 | truck        | 33.669 | boat           | 26.893 |
| traffic light | 27.209 | fire hydrant | 66.158 | stop sign      | 65.836 |
| parking meter | 43.886 | bench        | 24.038 | bird           | 36.551 |
| cat           | 62.168 | dog          | 58.581 | horse          | 56.822 |
| sheep         | 49.903 | cow          | 53.541 | elephant       | 59.534 |
| bear          | 68.135 | zebra        | 65.108 | giraffe        | 64.143 |
| backpack      | 15.814 | umbrella     | 37.993 | handbag        | 14.656 |
| tie           | 32.060 | suitcase     | 37.098 | frisbee        | 63.240 |
| skis          | 22.691 | snowboard    | 32.603 | sports ball    | 46.379 |
| kite          | 41.527 | baseball bat | 27.225 | baseball glove | 34.695 |
| skateboard    | 49.108 | surfboard    | 35.116 | tennis racket  | 47.308 |
| bottle        | 38.765 | wine glass   | 35.143 | cup            | 40.831 |
| fork          | 34.533 | knife        | 17.379 | spoon          | 15.792 |
| bowl          | 40.787 | banana       | 23.224 | apple          | 19.382 |
| sandwich      | 31.478 | orange       | 29.419 | broccoli       | 21.541 |
| carrot        | 21.991 | hot dog      | 30.782 | pizza          | 50.570 |
| donut         | 42.976 | cake         | 34.088 | chair          | 26.252 |
| couch         | 39.110 | potted plant | 26.181 | bed            | 37.305 |
| dining table  | 26.871 | toilet       | 58.641 | tv             | 54.275 |
| laptop        | 57.916 | mouse        | 62.390 | remote         | 30.654 |
| keyboard      | 52.291 | cell phone   | 33.529 | microwave      | 52.223 |
| oven          | 31.390 | toaster      | 44.195 | sink           | 37.394 |
| refrigerator  | 52.529 | book         | 15.838 | clock          | 47.557 |
| vase          | 37.813 | scissors     | 23.726 | teddy bear     | 43.291 |
| hair drier    | 4.950  | toothbrush   | 21.977 |                |        |
[09/06 15:47:41 d2.engine.defaults]: Evaluation results for coco_2017_val in csv format:
[09/06 15:47:41 d2.evaluation.testing]: copypaste: Task: bbox
[09/06 15:47:41 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl
[09/06 15:47:41 d2.evaluation.testing]: copypaste: 40.0645,60.8442,43.4570,23.8066,43.4178,52.0706

训练日志文件记录了训练过程中的详细信息,包括环境信息、配置文件内容、模型定义、数据集加载、训练过程等。以下是日志文件的关键部分和解释:

环境信息

[09/06 02:16:30 detectron2]: Environment info:
-------------------------------  --------------------------------------------------------------
sys.platform                     linux
Python                           3.8.10 (default, Nov 14 2022, 12:59:47) [GCC 9.4.0]
numpy                            1.22.2
detectron2                       0.6 @/root/detectron2/detectron2
Compiler                         GCC 9.4
CUDA compiler                    CUDA 12.0
...
-------------------------------  --------------------------------------------------------------

配置文件内容

[09/06 02:16:30 detectron2]: Contents of args.config_file=../configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml:
_BASE_: "../Base-RCNN-FPN.yaml"
MODEL:WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"MASK_ON: FalseRESNETS:DEPTH: 50
SOLVER:STEPS: (210000, 250000)MAX_ITER: 270000

模型定义

[09/06 02:16:32 d2.engine.defaults]: Model: GeneralizedRCNN(...)

数据集加载

[09/06 02:16:53 d2.data.datasets.coco]: Loading /mnt/coco/annotations/instances_train2017.json takes 21.55 seconds.
[09/06 02:16:55 d2.data.datasets.coco]: Loaded 118287 images in COCO format from /mnt/coco/annotations/instances_train2017.json

训练过程

[09/06 02:17:20 d2.engine.train_loop]: Starting training from iteration 30000
...
[09/06 15:46:48 d2.engine.hooks]: Overall training speed: 239998 iterations in 13:15:06 (0.1988 s / it)

推理时间

[09/06 15:47:23 d2.evaluation.evaluator]: Total inference time: 0:00:29.375996 (0.047381 s / iter per device, on 8 devices)
[09/06 15:47:23 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:27 (0.044494 s / iter per device, on 8 devices)

平均精度(AP)

[09/06 15:47:40 d2.evaluation.coco_evaluation]: Evaluation results for bbox: 
|   AP   |  AP50  |  AP75  |  APs   |  APm   |  APl   |
|:------:|:------:|:------:|:------:|:------:|:------:|
| 40.064 | 60.844 | 43.457 | 23.807 | 43.418 | 52.071 |

每类别的AP

[09/06 15:47:40 d2.evaluation.coco_evaluation]: Per-category bbox AP: 
| category      | AP     | category     | AP     | category       | AP     |
|:--------------|:-------|:-------------|:-------|:---------------|:-------|
| person        | 54.495 | bicycle      | 30.683 | car            | 44.135 |
| motorcycle    | 42.270 | airplane     | 63.287 | bus            | 63.137 |
| train         | 60.485 | truck        | 33.669 | boat           | 26.893 |
...
| hair drier    | 4.950  | toothbrush   | 21.977 |                |        |

四、训练指标比较

Model Zoo中的指标

Namelr schedtrain time (s/iter)inference time (s/im)train mem (GB)box APmodel iddownload
R50-C41x0.5510.1024.835.7137257644model | metrics
R50-DC51x0.3800.0685.037.3137847829model | metrics
R50-FPN1x0.2100.0383.037.9137257794model | metrics
R50-C43x0.5430.1044.838.4137849393model | metrics
R50-DC53x0.3780.0705.039.0137849425model | metrics
R50-FPN3x0.2090.0383.040.2137849458model | metrics
R101-C43x0.6190.1395.941.1138204752model | metrics
R101-DC53x0.4520.0866.140.6138204841model | metrics
R101-FPN3x0.2860.0514.142.0137851257model | metrics
X101-FPN3x0.6380.0986.743.0139173657model | metrics

我们复现的模型指标

  1. 训练时间 (s/iter):0.1988
  2. 推理时间 (s/im):0.047381
  3. 训练内存 (GB):最大约 2.9GB
  4. 检测精度 (box AP):40.0645

比较分析

  1. 训练时间 (s/iter)

    • Model Zoo 中 R50-FPN 3x 模型的训练时间为 0.209s/iter,而我们训练的模型为 0.1988s/iter。我们的训练时间略短,可能是由于硬件配置或优化的差异。
  2. 推理时间 (s/im)

    • Model Zoo 中 R50-FPN 3x 模型的推理时间为 0.038s/im,而我们复现的模型推理时间为 0.047381s/im,稍长一些。
  3. 训练内存 (GB)

    • Model Zoo 中 R50-FPN 3x 模型的训练内存为 3.0GB,而我们的训练内存为 2.9GB,基本相当。
  4. 检测精度 (box AP)

    • Model Zoo 中 R50-FPN 3x 模型的检测精度为 40.2,而我们训练的模型检测精度为 40.0645,基本相当,略低于 Model Zoo 中的结果。

结果总结

通过比较可以看出,我们训练的模型在各项指标上与 Model Zoo 中的 R50-FPN 3x 模型非常接近,训练时间略短,内存使用相当,检测精度稍低。总体来说,复现的结果是成功的,证明了训练过程的可靠性和模型的有效性。

五、训练输出目录

训练输出目录包含了训练过程中生成的所有重要文件,包括配置文件、事件日志、训练日志、评估指标和多个模型检查点文件。

目录结构

/mnt/output
├── config.yaml
├── events.out.tfevents.*
├── inference/
├── last_checkpoint
├── log.txt
├── log.txt.rank*
├── metrics.json
├── model_*.pth
└── model_final.pth

文件解释

  1. config.yaml:记录训练配置,便于复现实验。
  2. events.out.tfevents.*:用于TensorBoard可视化,帮助监控训练过程。
  3. inference/:存储推理结果。
  4. last_checkpoint:记录最新的检查点文件名,便于恢复训练。
  5. log.txtlog.txt.rank*:记录训练过程中的详细日志信息,便于调试和分析。
  6. metrics.json:记录训练和验证过程中计算的各种指标。
  7. model_*.pthmodel_final.pth:保存模型权重,用于恢复训练、评估或部署。

文件作用总结

  • 配置文件(config.yaml):记录训练配置,便于复现实验。
  • 事件文件(events.out.tfevents.*):用于TensorBoard可视化,帮助监控训练过程。
  • 推理目录(inference/):存储推理结果。
  • 检查点记录(last_checkpoint):记录最新的检查点文件名,便于恢复训练。
  • 日志文件(log.txt、log.txt.rank*:记录训练过程中的详细日志信息,便于调试和分析。
  • 指标文件(metrics.json):记录训练和验证过程中计算的各种指标,便于分析和比较。
  • 模型检查点文件(model_*.pth、model_final.pth):保存模型权重,用于恢复训练、评估或部署。

这些文件和目录共同构成了一个完整的模型训练输出,便于后续的分析、调试和部署。

复现过程的价值

复现过程不仅验证了原始研究结果的可靠性,还帮助我们深入理解了模型的训练和评估过程。通过比较不同模型的指标,我们可以选择最合适的模型架构和训练策略,提高模型的性能和效率。

结论

本文详细介绍了如何使用Detectron2进行目标检测模型的训练,涵盖数据准备、训练命令、训练日志分析、训练指标以及训练输出目录的各个文件及其作用。通过复现Model Zoo中的模型训练过程,并在训练过程中出现中断后使用 resume 功能继续训练,我们验证了训练结果的可靠性,并深入理解了模型的性能指标。希望这篇文章能为读者提供有价值的参考,帮助大家更好地进行模型训练和评估。

这篇关于Detectorn2预训练模型复现:数据准备、训练命令、日志分析与输出目录的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/1148382

相关文章

大模型研发全揭秘:客服工单数据标注的完整攻略

在人工智能(AI)领域,数据标注是模型训练过程中至关重要的一步。无论你是新手还是有经验的从业者,掌握数据标注的技术细节和常见问题的解决方案都能为你的AI项目增添不少价值。在电信运营商的客服系统中,工单数据是客户问题和解决方案的重要记录。通过对这些工单数据进行有效标注,不仅能够帮助提升客服自动化系统的智能化水平,还能优化客户服务流程,提高客户满意度。本文将详细介绍如何在电信运营商客服工单的背景下进行

基于MySQL Binlog的Elasticsearch数据同步实践

一、为什么要做 随着马蜂窝的逐渐发展,我们的业务数据越来越多,单纯使用 MySQL 已经不能满足我们的数据查询需求,例如对于商品、订单等数据的多维度检索。 使用 Elasticsearch 存储业务数据可以很好的解决我们业务中的搜索需求。而数据进行异构存储后,随之而来的就是数据同步的问题。 二、现有方法及问题 对于数据同步,我们目前的解决方案是建立数据中间表。把需要检索的业务数据,统一放到一张M

关于数据埋点,你需要了解这些基本知识

产品汪每天都在和数据打交道,你知道数据来自哪里吗? 移动app端内的用户行为数据大多来自埋点,了解一些埋点知识,能和数据分析师、技术侃大山,参与到前期的数据采集,更重要是让最终的埋点数据能为我所用,否则可怜巴巴等上几个月是常有的事。   埋点类型 根据埋点方式,可以区分为: 手动埋点半自动埋点全自动埋点 秉承“任何事物都有两面性”的道理:自动程度高的,能解决通用统计,便于统一化管理,但个性化定

使用SecondaryNameNode恢复NameNode的数据

1)需求: NameNode进程挂了并且存储的数据也丢失了,如何恢复NameNode 此种方式恢复的数据可能存在小部分数据的丢失。 2)故障模拟 (1)kill -9 NameNode进程 [lytfly@hadoop102 current]$ kill -9 19886 (2)删除NameNode存储的数据(/opt/module/hadoop-3.1.4/data/tmp/dfs/na

异构存储(冷热数据分离)

异构存储主要解决不同的数据,存储在不同类型的硬盘中,达到最佳性能的问题。 异构存储Shell操作 (1)查看当前有哪些存储策略可以用 [lytfly@hadoop102 hadoop-3.1.4]$ hdfs storagepolicies -listPolicies (2)为指定路径(数据存储目录)设置指定的存储策略 hdfs storagepolicies -setStoragePo

Hadoop集群数据均衡之磁盘间数据均衡

生产环境,由于硬盘空间不足,往往需要增加一块硬盘。刚加载的硬盘没有数据时,可以执行磁盘数据均衡命令。(Hadoop3.x新特性) plan后面带的节点的名字必须是已经存在的,并且是需要均衡的节点。 如果节点不存在,会报如下错误: 如果节点只有一个硬盘的话,不会创建均衡计划: (1)生成均衡计划 hdfs diskbalancer -plan hadoop102 (2)执行均衡计划 hd

性能分析之MySQL索引实战案例

文章目录 一、前言二、准备三、MySQL索引优化四、MySQL 索引知识回顾五、总结 一、前言 在上一讲性能工具之 JProfiler 简单登录案例分析实战中已经发现SQL没有建立索引问题,本文将一起从代码层去分析为什么没有建立索引? 开源ERP项目地址:https://gitee.com/jishenghua/JSH_ERP 二、准备 打开IDEA找到登录请求资源路径位置

Andrej Karpathy最新采访:认知核心模型10亿参数就够了,AI会打破教育不公的僵局

夕小瑶科技说 原创  作者 | 海野 AI圈子的红人,AI大神Andrej Karpathy,曾是OpenAI联合创始人之一,特斯拉AI总监。上一次的动态是官宣创办一家名为 Eureka Labs 的人工智能+教育公司 ,宣布将长期致力于AI原生教育。 近日,Andrej Karpathy接受了No Priors(投资博客)的采访,与硅谷知名投资人 Sara Guo 和 Elad G

【Prometheus】PromQL向量匹配实现不同标签的向量数据进行运算

✨✨ 欢迎大家来到景天科技苑✨✨ 🎈🎈 养成好习惯,先赞后看哦~🎈🎈 🏆 作者简介:景天科技苑 🏆《头衔》:大厂架构师,华为云开发者社区专家博主,阿里云开发者社区专家博主,CSDN全栈领域优质创作者,掘金优秀博主,51CTO博客专家等。 🏆《博客》:Python全栈,前后端开发,小程序开发,人工智能,js逆向,App逆向,网络系统安全,数据分析,Django,fastapi

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]