Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss 'nan',Customised datasets with 1 category in boxinst #12

Open
shanghangjiang opened this issue Mar 15, 2023 · 2 comments
Open

loss 'nan',Customised datasets with 1 category in boxinst #12

shanghangjiang opened this issue Mar 15, 2023 · 2 comments

Comments

@shanghangjiang
Copy link

I train boxinst with a customised datasets contain 1 category, loss 'nan', but I have visualize the dataset, there is nothing wrong with the annotation.
This is my config file:
base = [
'../base/default_runtime.py'
]

model settings

model = dict(
type='CondInst',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
zero_init_residual=False,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_output', # use P5
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='CondInstBoxHead',
num_classes=1,
in_channels=256,
center_sampling=True,
center_sample_radius=1.5,
norm_on_bbox=True,
stacked_convs=4,
feat_channels=256,
strides=[8, 16, 32, 64, 128],
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=1.0),
loss_centerness=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
mask_branch=dict(
type='CondInstMaskBranch',
in_channels=256,
in_indices=[0, 1, 2],
strides=[8, 16, 32],
branch_convs=4,
branch_channels=128,
branch_out_channels=16),
mask_head=dict(
type='CondInstMaskHead',
in_channels=16,
in_stride=8,
out_stride=4,
dynamic_convs=3,
dynamic_channels=8,
disable_rel_coors=False,
bbox_head_channels=256,
sizes_of_interest=[64, 128, 256, 512, 1024],
max_proposals=-1,
topk_per_img=64,
boxinst_enabled=True,
bottom_pixels_removed=10,
pairwise_size=3,
pairwise_dilation=2,
pairwise_color_thresh=0.3,
pairwise_warmup=10000),
# training and testing settings
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=2000,
output_segm=False))

dataset_type = 'CocoDataset'
data_root = '/data/shjiang/VDD-C/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(type='Resize',
img_scale=[(1333, 800), (1333, 768), (1333, 736),
(1333, 704), (1333, 672), (1333, 640)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
classes=('diver',),
type=dataset_type,
ann_file=data_root + 'annotations/train.json',
img_prefix=data_root + 'images/',
pipeline=train_pipeline),
val=dict(
classes=('diver',),
type=dataset_type,
ann_file=data_root + 'annotations/val.json',
img_prefix=data_root + 'images/',
pipeline=test_pipeline),
test=dict(
classes=('diver',),
type=dataset_type,
ann_file=data_root + 'annotations/test.json',
img_prefix=data_root + 'images/',
pipeline=test_pipeline))
evaluation = dict(metric=['bbox', 'segm'])

optimizer = dict(type='SGD', lr=0.001, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)

learning policy

lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])

runner = dict(type='EpochBasedRunner', max_epochs=36)
evaluation = dict(interval=1, metric=['bbox', 'segm'])
checkpoint_config = dict(interval=1)
work_dir = './work_dirs/boxinst_vddc_3x'
load_from = None
resume_from = None

Below are my annotation samples:
{"images": [{"file_name": "barbados_scuba_011_A_0352.jpg", "height": 1080, "width": 1920, "id": 20352}, {"file_name": "barbados_scuba_003_B_0845.jpg", "height": 1080, "width": 1920, "id": 60845}]
"annotations": [{"area": 843402, "iscrowd": 0, "bbox": [986, 177, 934, 903], "category_id": 1, "ignore": 0, "segmentation": [], "image_id": 20352, "id": 1}]
"categories": [{"id": 1, "name": "diver"}]}

@shanghangjiang
Copy link
Author

I also use defualt settings to train boxinst on coco, still loss 'nan'

@LiWentomng
Copy link
Owner

LiWentomng commented Mar 17, 2023

@shanghangjiang
Hello, I have test the default settings for boxinst. It works well.

The training logs are listed as the following:

2023-03-17 20:44:14,061 - mmdet - INFO - Distributed training: True
2023-03-17 20:44:14,929 - mmdet - INFO - Config:
log_config = dict(interval=50, hooks=[dict(type='TextLoggerHook')])
custom_hooks = [dict(type='NumClassCheckHook')]
dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
opencv_num_threads = 0
mp_start_method = 'fork'
auto_scale_lr = dict(enable=False, base_batch_size=16)
model = dict(
type='CondInst',
backbone=dict(
type='ResNet',
depth=50,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
zero_init_residual=False,
norm_cfg=dict(type='BN', requires_grad=True),
norm_eval=True,
style='pytorch',
init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50')),
neck=dict(
type='FPN',
in_channels=[256, 512, 1024, 2048],
out_channels=256,
start_level=1,
add_extra_convs='on_output',
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='CondInstBoxHead',
num_classes=80,
in_channels=256,
center_sampling=True,
center_sample_radius=1.5,
norm_on_bbox=True,
stacked_convs=4,
feat_channels=256,
strides=[8, 16, 32, 64, 128],
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='GIoULoss', loss_weight=1.0),
loss_centerness=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)),
mask_branch=dict(
type='CondInstMaskBranch',
in_channels=256,
in_indices=[0, 1, 2],
strides=[8, 16, 32],
branch_convs=4,
branch_channels=128,
branch_out_channels=16),
mask_head=dict(
type='CondInstMaskHead',
in_channels=16,
in_stride=8,
out_stride=4,
dynamic_convs=3,
dynamic_channels=8,
disable_rel_coors=False,
bbox_head_channels=256,
sizes_of_interest=[64, 128, 256, 512, 1024],
max_proposals=-1,
topk_per_img=64,
boxinst_enabled=True,
bottom_pixels_removed=10,
pairwise_size=3,
pairwise_dilation=2,
pairwise_color_thresh=0.3,
pairwise_warmup=10000),
train_cfg=dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1),
allowed_border=-1,
pos_weight=-1,
debug=False),
test_cfg=dict(
nms_pre=2000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_threshold=0.5),
max_per_img=2000,
output_segm=False))
dataset_type = 'CocoDataset'
data_root = '/mnt/SSD/lwt_workdir/data/coco/'
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(
type='Resize',
img_scale=[(1333, 800), (1333, 768), (1333, 736), (1333, 704),
(1333, 672), (1333, 640)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type='CocoDataset',
ann_file=
'/mnt/SSD/lwt_workdir/data/coco/annotations/instances_train2017.json',
img_prefix='/mnt/SSD/lwt_workdir/data/coco/train2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True, with_mask=False),
dict(
type='Resize',
img_scale=[(1333, 800), (1333, 768), (1333, 736), (1333, 704),
(1333, 672), (1333, 640)],
multiscale_mode='value',
keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]),
val=dict(
type='CocoDataset',
ann_file=
'/mnt/SSD/lwt_workdir/data/coco/annotations/instances_val2017.json',
img_prefix='/mnt/SSD/lwt_workdir/data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]),
test=dict(
type='CocoDataset',
ann_file=
'/mnt/SSD/lwt_workdir/data/coco/annotations/instances_val2017.json',
img_prefix='/mnt/SSD/lwt_workdir/data/coco/val2017/',
pipeline=[
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1333, 800),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(
type='Normalize',
mean=[123.675, 116.28, 103.53],
std=[58.395, 57.12, 57.375],
to_rgb=True),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img'])
])
]))
evaluation = dict(interval=1, metric=['bbox', 'segm'])
optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=0.001,
step=[27, 33])
runner = dict(type='EpochBasedRunner', max_epochs=36)
checkpoint_config = dict(interval=1)
work_dir = './work_dirs/boxinst_coco_3x'
auto_resume = False
gpu_ids = range(0, 2)

2023-03-17 20:44:14,930 - mmdet - INFO - Set random seed to 0, deterministic: False
2023-03-17 20:44:15,342 - mmdet - INFO - initialize ResNet with init_cfg {'type': 'Pretrained', 'checkpoint': 'torchvision://resnet50'}
2023-03-17 20:44:15,342 - mmcv - INFO - load model from: torchvision://resnet50
2023-03-17 20:44:15,342 - mmcv - INFO - load checkpoint from torchvision path: torchvision://resnet50
2023-03-17 20:44:15,554 - mmcv - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

2023-03-17 20:44:15,588 - mmdet - INFO - initialize FPN with init_cfg {'type': 'Xavier', 'layer': 'Conv2d', 'distribution': 'uniform'}
2023-03-17 20:44:15,615 - mmdet - INFO - initialize CondInstBoxHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'override': {'type': 'Normal', 'name': 'conv_cls', 'std': 0.01, 'bias_prob': 0.01}}
2023-03-17 20:44:15,674 - mmdet - INFO - initialize CondInstMaskBranch with init_cfg {'type': 'Kaiming', 'layer': 'Conv2d', 'distribution': 'uniform', 'a': 1, 'mode': 'fan_in', 'nonlinearity': 'leaky_relu'}
2023-03-17 20:44:15,690 - mmdet - INFO - initialize CondInstMaskHead with init_cfg {'type': 'Normal', 'layer': 'Conv2d', 'std': 0.01, 'bias': 0}
loading annotations into memory...
loading annotations into memory...
Done (t=14.38s)
creating index...
index created!
Done (t=15.70s)
creating index...
index created!
fatal: not a git repository (or any parent up to mount point /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
fatal: not a git repository (or any parent up to mount point /mnt)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
2023-03-17 20:44:34,908 - mmdet - INFO - Automatic scaling of learning rate (LR) has been disabled.
loading annotations into memory...
loading annotations into memory...
Done (t=0.46s)
creating index...
Done (t=0.50s)
creating index...
index created!
index created!
2023-03-17 20:44:35,477 - mmdet - INFO - Start running, host: lwt@ps, work_dir: /mnt/SSD/lwt_workdir/new_code/BoxInstSeg/work_dirs/boxinst_coco_3x
2023-03-17 20:44:35,477 - mmdet - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) CheckpointHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_train_epoch:
(VERY_HIGH ) StepLrUpdaterHook
(NORMAL ) NumClassCheckHook
(NORMAL ) DistSamplerSeedHook
(LOW ) IterTimerHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_train_iter:
(VERY_HIGH ) StepLrUpdaterHook
(LOW ) IterTimerHook
(LOW ) DistEvalHook
after_train_iter:
(ABOVE_NORMAL) OptimizerHook
(NORMAL ) CheckpointHook
(LOW ) IterTimerHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
after_train_epoch:
(NORMAL ) CheckpointHook
(LOW ) DistEvalHook
(VERY_LOW ) TextLoggerHook
before_val_epoch:
(NORMAL ) NumClassCheckHook
(NORMAL ) DistSamplerSeedHook
(LOW ) IterTimerHook
(VERY_LOW ) TextLoggerHook
before_val_iter:
(LOW ) IterTimerHook
after_val_iter:
(LOW ) IterTimerHook
after_val_epoch:
(VERY_LOW ) TextLoggerHook
after_run:
(VERY_LOW ) TextLoggerHook

2023-03-17 20:44:35,478 - mmdet - INFO - workflow: [('train', 1)], max: 36 epochs
2023-03-17 20:44:35,478 - mmdet - INFO - Checkpoints will be saved to /mnt/SSD/lwt_workdir/new_code/BoxInstSeg/work_dirs/boxinst_coco_3x by HardDiskBackend.
2023-03-17 20:44:41,941 - mmcv - INFO - Reducer buckets have been rebuilt in this iteration.
2023-03-17 20:45:02,344 - mmdet - INFO - Epoch [1][50/29317] lr: 9.890e-04, eta: 6 days, 13:22:14, time: 0.537, data_time: 0.114, memory: 5067, loss_cls: 1.0509, loss_bbox: 0.8177, loss_centerness: 0.6783, loss_prj: 0.8812, loss_pairwise: 0.0009, loss: 3.4290
2023-03-17 20:45:23,345 - mmdet - INFO - Epoch [1][100/29317] lr: 1.988e-03, eta: 5 days, 20:11:35, time: 0.420, data_time: 0.005, memory: 5207, loss_cls: 0.9549, loss_bbox: 0.5695, loss_centerness: 0.6804, loss_prj: 0.4198, loss_pairwise: 0.0016, loss: 2.6263
2023-03-17 20:45:43,824 - mmdet - INFO - Epoch [1][150/29317] lr: 2.987e-03, eta: 5 days, 13:31:37, time: 0.410, data_time: 0.005, memory: 5207, loss_cls: 0.9264, loss_bbox: 0.5596, loss_centerness: 0.6768, loss_prj: 0.3729, loss_pairwise: 0.0025, loss: 2.5382
2023-03-17 20:46:04,404 - mmdet - INFO - Epoch [1][200/29317] lr: 3.986e-03, eta: 5 days, 10:18:03, time: 0.412, data_time: 0.005, memory: 5207, loss_cls: 0.9353, loss_bbox: 0.5364, loss_centerness: 0.6692, loss_prj: 0.3599, loss_pairwise: 0.0035, loss: 2.5043
2023-03-17 20:46:24,665 - mmdet - INFO - Epoch [1][250/29317] lr: 4.985e-03, eta: 5 days, 7:58:27, time: 0.405, data_time: 0.005, memory: 5207, loss_cls: 0.8976, loss_bbox: 0.5471, loss_centerness: 0.6722, loss_prj: 0.3683, loss_pairwise: 0.0039, loss: 2.4890
2023-03-17 20:46:45,612 - mmdet - INFO - Epoch [1][300/29317] lr: 5.984e-03, eta: 5 days, 7:06:07, time: 0.419, data_time: 0.005, memory: 5207, loss_cls: 0.9304, loss_bbox: 0.5266, loss_centerness: 0.6694, loss_prj: 0.3763, loss_pairwise: 0.0052, loss: 2.5079
2023-03-17 20:47:06,033 - mmdet - INFO - Epoch [1][350/29317] lr: 6.983e-03, eta: 5 days, 6:03:05, time: 0.409, data_time: 0.005, memory: 5207, loss_cls: 0.8870, loss_bbox: 0.5204, loss_centerness: 0.6681, loss_prj: 0.3518, loss_pairwise: 0.0054, loss: 2.4326
2023-03-17 20:47:26,702 - mmdet - INFO - Epoch [1][400/29317] lr: 7.982e-03, eta: 5 days, 5:25:24, time: 0.413, data_time: 0.005, memory: 5207, loss_cls: 0.8060, loss_bbox: 0.5170, loss_centerness: 0.6693, loss_prj: 0.3397, loss_pairwise: 0.0062, loss: 2.3382
2023-03-17 20:47:47,521 - mmdet - INFO - Epoch [1][450/29317] lr: 8.981e-03, eta: 5 days, 5:02:25, time: 0.416, data_time: 0.005, memory: 5207, loss_cls: 0.8213, loss_bbox: 0.5000, loss_centerness: 0.6686, loss_prj: 0.3202, loss_pairwise: 0.0066, loss: 2.3167
2023-03-17 20:48:08,123 - mmdet - INFO - Epoch [1][500/29317] lr: 9.980e-03, eta: 5 days, 4:36:39, time: 0.412, data_time: 0.006, memory: 5207, loss_cls: 0.8300, loss_bbox: 0.4803, loss_centerness: 0.6600, loss_prj: 0.3168, loss_pairwise: 0.0073, loss: 2.2944

Besides, your annotation of your dataset seems right, according to your given sample.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants