loss 'nan',Customised datasets with 1 category in boxinst #12

shanghangjiang · 2023-03-15T14:58:17Z

I train boxinst with a customised datasets contain 1 category, loss 'nan', but I have visualize the dataset, there is nothing wrong with the annotation.
This is my config file:
base = [
'../base/default_runtime.py'
]

model settings

model = dict(
type='CondInst',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
zero_init_residual=False,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_output', # use P5
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='CondInstBoxHead',
num_classes=1,
in_channels=256,
center_sampling=True,
center_sample_radius=1.5,
norm_on_bbox=True,
stacked_convs=4,
feat_channels=256,
strides=[8, 16, 32, 64, 128],
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=1.0),
loss_centerness=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
mask_branch=dict(
type='CondInstMaskBranch',
in_channels=256,
in_indices=[0, 1, 2],
strides=[8, 16, 32],
branch_convs=4,
branch_channels=128,
branch_out_channels=16),
mask_head=dict(
type='CondInstMaskHead',
in_channels=16,
in_stride=8,
out_stride=4,
dynamic_convs=3,
dynamic_channels=8,
disable_rel_coors=False,
bbox_head_channels=256,
sizes_of_interest=[64, 128, 256, 512, 1024],
max_proposals=-1,
topk_per_img=64,
boxinst_enabled=True,
bottom_pixels_removed=10,
pairwise_size=3,
pairwise_dilation=2,
pairwise_color_thresh=0.3,
pairwise_warmup=10000),
# training and testing settings
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=2000,
output_segm=False))

dataset_type = 'CocoDataset'
data_root = '/data/shjiang/VDD-C/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(type='Resize',
img_scale=[(1333, 800), (1333, 768), (1333, 736),
(1333, 704), (1333, 672), (1333, 640)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
classes=('diver',),
type=dataset_type,
ann_file=data_root + 'annotations/train.json',
img_prefix=data_root + 'images/',
pipeline=train_pipeline),
val=dict(
classes=('diver',),
type=dataset_type,
ann_file=data_root + 'annotations/val.json',
img_prefix=data_root + 'images/',
pipeline=test_pipeline),
test=dict(
classes=('diver',),
type=dataset_type,
ann_file=data_root + 'annotations/test.json',
img_prefix=data_root + 'images/',
pipeline=test_pipeline))
evaluation = dict(metric=['bbox', 'segm'])

optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)

learning policy

lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])

runner = dict(type='EpochBasedRunner', max_epochs=36)
evaluation = dict(interval=1, metric=['bbox', 'segm'])
checkpoint_config = dict(interval=1)
work_dir = './work_dirs/boxinst_vddc_3x'
load_from = None
resume_from = None

Below are my annotation samples:
{"images": [{"file_name": "barbados_scuba_011_A_0352.jpg", "height": 1080, "width": 1920, "id": 20352}, {"file_name": "barbados_scuba_003_B_0845.jpg", "height": 1080, "width": 1920, "id": 60845}]
"annotations": [{"area": 843402, "iscrowd": 0, "bbox": [986, 177, 934, 903], "category_id": 1, "ignore": 0, "segmentation": [], "image_id": 20352, "id": 1}]
"categories": [{"id": 1, "name": "diver"}]}

shanghangjiang · 2023-03-17T02:43:42Z

I also use defualt settings to train boxinst on coco, still loss 'nan'

LiWentomng · 2023-03-17T12:52:35Z

@shanghangjiang
Hello, I have test the default settings for boxinst. It works well.

The training logs are listed as the following:

2023-03-17 20:44:14,061 - mmdet - INFO - Distributed training: True
2023-03-17 20:44:14,929 - mmdet - INFO - Config:
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
auto_scale_lr = dict(enable=False, base_batch_size=16)
model = dict(
type='CondInst',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
zero_init_residual=False,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_output',
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='CondInstBoxHead',
num_classes=80,
in_channels=256,
center_sampling=True,
center_sample_radius=1.5,
norm_on_bbox=True,
stacked_convs=4,
feat_channels=256,
strides=[8, 16, 32, 64, 128],
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=1.0),
loss_centerness=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
mask_branch=dict(
type='CondInstMaskBranch',
in_channels=256,
in_indices=[0, 1, 2],
strides=[8, 16, 32],
branch_convs=4,
branch_channels=128,
branch_out_channels=16),
mask_head=dict(
type='CondInstMaskHead',
in_channels=16,
in_stride=8,
out_stride=4,
dynamic_convs=3,
dynamic_channels=8,
disable_rel_coors=False,
bbox_head_channels=256,
sizes_of_interest=[64, 128, 256, 512, 1024],
max_proposals=-1,
topk_per_img=64,
boxinst_enabled=True,
bottom_pixels_removed=10,
pairwise_size=3,
pairwise_dilation=2,
pairwise_color_thresh=0.3,
pairwise_warmup=10000),
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=2000,
output_segm=False))
dataset_type = 'CocoDataset'
data_root = '/mnt/SSD/lwt_workdir/data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(
type='Resize',
img_scale=[(1333, 800), (1333, 768), (1333, 736), (1333, 704),
(1333, 672), (1333, 640)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='CocoDataset',
ann_file=
'/mnt/SSD/lwt_workdir/data/coco/annotations/instances_train2017.json',
img_prefix='/mnt/SSD/lwt_workdir/data/coco/train2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(
type='Resize',
img_scale=[(1333, 800), (1333, 768), (1333, 736), (1333, 704),
(1333, 672), (1333, 640)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]),
val=dict(
type='CocoDataset',
ann_file=
'/mnt/SSD/lwt_workdir/data/coco/annotations/instances_val2017.json',
img_prefix='/mnt/SSD/lwt_workdir/data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
test=dict(
type='CocoDataset',
ann_file=
'/mnt/SSD/lwt_workdir/data/coco/annotations/instances_val2017.json',
img_prefix='/mnt/SSD/lwt_workdir/data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]))
evaluation = dict(interval=1, metric=['bbox', 'segm'])
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
checkpoint_config = dict(interval=1)
work_dir = './work_dirs/boxinst_coco_3x'
auto_resume = False
gpu_ids = range(0, 2)

2023-03-17 20:44:14,930 - mmdet - INFO - Set random seed to 0, deterministic: False
2023-03-17 20:44:15,342 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'}
2023-03-17 20:44:15,342 - mmcv - INFO - load model from: torchvision://resnet50
2023-03-17 20:44:15,342 - mmcv - INFO - load checkpoint from torchvision path: torchvision://resnet50
2023-03-17 20:44:15,554 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

2023-03-17 20:44:15,588 - mmdet - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}
2023-03-17 20:44:15,615 - mmdet - INFO - initialize CondInstBoxHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'conv_cls', 'std': 0.01, 'bias_prob': 0.01}}
2023-03-17 20:44:15,674 - mmdet - INFO - initialize CondInstMaskBranch with init_cfg {'type': 'Kaiming', 'layer': 'Conv2d', 'distribution': 'uniform', 'a': 1, 'mode': 'fan_in', 'nonlinearity': 'leaky_relu'}
2023-03-17 20:44:15,690 - mmdet - INFO - initialize CondInstMaskHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'bias': 0}
loading annotations into memory...
loading annotations into memory...
Done (t=14.38s)
creating index...
index created!
Done (t=15.70s)
creating index...
index created!
fatal: not a git repository (or any parent up to mount point /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2023-03-17 20:44:34,908 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled.
loading annotations into memory...
loading annotations into memory...
Done (t=0.46s)
creating index...
Done (t=0.50s)
creating index...
index created!
index created!
2023-03-17 20:44:35,477 - mmdet - INFO - Start running, host: lwt@ps, work_dir: /mnt/SSD/lwt_workdir/new_code/BoxInstSeg/work_dirs/boxinst_coco_3x
2023-03-17 20:44:35,477 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_train_epoch:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) NumClassCheckHook
(NORMAL ) DistSamplerSeedHook
(LOW ) IterTimerHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_train_iter:
(VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) DistEvalHook
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
after_train_epoch:
(NORMAL ) CheckpointHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_val_epoch:
(NORMAL ) NumClassCheckHook
(NORMAL ) DistSamplerSeedHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
before_val_iter:
(LOW ) IterTimerHook
after_val_iter:
(LOW ) IterTimerHook
after_val_epoch:
(VERY_LOW ) TextLoggerHook
after_run:
(VERY_LOW ) TextLoggerHook

2023-03-17 20:44:35,478 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
2023-03-17 20:44:35,478 - mmdet - INFO - Checkpoints will be saved to /mnt/SSD/lwt_workdir/new_code/BoxInstSeg/work_dirs/boxinst_coco_3x by HardDiskBackend.
2023-03-17 20:44:41,941 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2023-03-17 20:45:02,344 - mmdet - INFO - Epoch [1][50/29317] lr: 9.890e-04, eta: 6 days, 13:22:14, time: 0.537, data_time: 0.114, memory: 5067, loss_cls: 1.0509, loss_bbox: 0.8177, loss_centerness: 0.6783, loss_prj: 0.8812, loss_pairwise: 0.0009, loss: 3.4290
2023-03-17 20:45:23,345 - mmdet - INFO - Epoch [1][100/29317] lr: 1.988e-03, eta: 5 days, 20:11:35, time: 0.420, data_time: 0.005, memory: 5207, loss_cls: 0.9549, loss_bbox: 0.5695, loss_centerness: 0.6804, loss_prj: 0.4198, loss_pairwise: 0.0016, loss: 2.6263
2023-03-17 20:45:43,824 - mmdet - INFO - Epoch [1][150/29317] lr: 2.987e-03, eta: 5 days, 13:31:37, time: 0.410, data_time: 0.005, memory: 5207, loss_cls: 0.9264, loss_bbox: 0.5596, loss_centerness: 0.6768, loss_prj: 0.3729, loss_pairwise: 0.0025, loss: 2.5382
2023-03-17 20:46:04,404 - mmdet - INFO - Epoch [1][200/29317] lr: 3.986e-03, eta: 5 days, 10:18:03, time: 0.412, data_time: 0.005, memory: 5207, loss_cls: 0.9353, loss_bbox: 0.5364, loss_centerness: 0.6692, loss_prj: 0.3599, loss_pairwise: 0.0035, loss: 2.5043
2023-03-17 20:46:24,665 - mmdet - INFO - Epoch [1][250/29317] lr: 4.985e-03, eta: 5 days, 7:58:27, time: 0.405, data_time: 0.005, memory: 5207, loss_cls: 0.8976, loss_bbox: 0.5471, loss_centerness: 0.6722, loss_prj: 0.3683, loss_pairwise: 0.0039, loss: 2.4890
2023-03-17 20:46:45,612 - mmdet - INFO - Epoch [1][300/29317] lr: 5.984e-03, eta: 5 days, 7:06:07, time: 0.419, data_time: 0.005, memory: 5207, loss_cls: 0.9304, loss_bbox: 0.5266, loss_centerness: 0.6694, loss_prj: 0.3763, loss_pairwise: 0.0052, loss: 2.5079
2023-03-17 20:47:06,033 - mmdet - INFO - Epoch [1][350/29317] lr: 6.983e-03, eta: 5 days, 6:03:05, time: 0.409, data_time: 0.005, memory: 5207, loss_cls: 0.8870, loss_bbox: 0.5204, loss_centerness: 0.6681, loss_prj: 0.3518, loss_pairwise: 0.0054, loss: 2.4326
2023-03-17 20:47:26,702 - mmdet - INFO - Epoch [1][400/29317] lr: 7.982e-03, eta: 5 days, 5:25:24, time: 0.413, data_time: 0.005, memory: 5207, loss_cls: 0.8060, loss_bbox: 0.5170, loss_centerness: 0.6693, loss_prj: 0.3397, loss_pairwise: 0.0062, loss: 2.3382
2023-03-17 20:47:47,521 - mmdet - INFO - Epoch [1][450/29317] lr: 8.981e-03, eta: 5 days, 5:02:25, time: 0.416, data_time: 0.005, memory: 5207, loss_cls: 0.8213, loss_bbox: 0.5000, loss_centerness: 0.6686, loss_prj: 0.3202, loss_pairwise: 0.0066, loss: 2.3167
2023-03-17 20:48:08,123 - mmdet - INFO - Epoch [1][500/29317] lr: 9.980e-03, eta: 5 days, 4:36:39, time: 0.412, data_time: 0.006, memory: 5207, loss_cls: 0.8300, loss_bbox: 0.4803, loss_centerness: 0.6600, loss_prj: 0.3168, loss_pairwise: 0.0073, loss: 2.2944

Besides, your annotation of your dataset seems right, according to your given sample.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss 'nan',Customised datasets with 1 category in boxinst #12

loss 'nan',Customised datasets with 1 category in boxinst #12

shanghangjiang commented Mar 15, 2023

shanghangjiang commented Mar 17, 2023

LiWentomng commented Mar 17, 2023 •

edited

loss 'nan',Customised datasets with 1 category in boxinst #12

loss 'nan',Customised datasets with 1 category in boxinst #12

Comments

shanghangjiang commented Mar 15, 2023

model settings

learning policy

shanghangjiang commented Mar 17, 2023

LiWentomng commented Mar 17, 2023 • edited

LiWentomng commented Mar 17, 2023 •

edited