-
-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training failed in SRT heatmaps? #80
Comments
I did some experiments to find problems. you can find the number of background label is 32*32. There is an imbalance between background and foreground. so i change the function(generate_label_map) and add AwingLoss and loss weight maps. They are all from a paper(AwingLoss). After,my model can be trianed well. |
Which project are you using?
SRT
There some problems in train heatmaps based on ProCPM model. i change your model backbone and train new model without pretrained weights. the trainnin loss log seems to be normal, but The test_300w NME is always 166.901.
when I show the batch_heatmaps, Only backgrounds can be trained, other foregrounds maps predicted is always zore numpy.
logs:
batch_size : 128
optimizer : sgd
LR : 0.0005
momentum : 0.9
Decay : 0.0005
nesterov : 1
criterion_ht : MSE-batch
epochs : 150
schedule : [60, 90, 120]
gamma : 0.1
pre_crop : 0.2
scale_min : 0.9
scale_max : 1.1
shear_max : 0.2
offset_max : 0.2
rotate_max : 30
cut_out : 0.1
sigma : 4
shape : [256, 256]
heatmap_type : gaussian
pixel_jitter_max : 20
downsample : 8
num_pts : 68
Training-data : GeneralDataset(point-num=68, shape=[256, 256], sigma=4, heatmap_type=gaussian, length=31528, cutout=0.1, dataset=train)
Testing-data : GeneralDataset(point-num=68, shape=[256, 256], sigma=4, heatmap_type=gaussian, length=689, cutout=0, dataset=test_300w)
Optimizer : SGD (
Parameter Group 0
dampening: 0
initial_lr: 0.0005
lr: 0.0005
momentum: 0.9
nesterov: 1
weight_decay: 0.0005
)
MSE Loss with reduction=['MSE', 'batch']
=> do not find the last-info file : ../snopshots/last-info.pth
==>>[2021-01-04 10:47:29] [epoch-000-150], [[Time Left: 00:00:00]], LR : [0.00050 ~ 0.00050], Config : {'epochs': 150, 'num_pts': 68, 'sigma': 4, 'print_freq': 10, 'downsample': 8, 'shape': [256, 256]}
-->[train]: [epoch-000-150][000/247] Time 16.47 (16.47) Data 7.20 (7.20) Forward 14.50 (14.50) Loss_all 104426.6016 (104426.6016) [Time Left: 01:07:30] ht_loss=104426.6016 : L1=35482.0781 : L2=35589.6680 : L3=33354.8516
-->[train]: [epoch-000-150][010/247] Time 1.11 (2.38) Data 0.16 (0.67) Forward 0.20 (1.38) Loss_all 279.1572 (11552.4368) [Time Left: 00:09:21] ht_loss=279.1572 : L1=93.0946 : L2=93.0299 : L3=93.0327
-->[train]: [epoch-000-150][020/247] Time 4.60 (2.21) Data 3.63 (0.87) Forward 3.68 (1.26) Loss_all 279.8896 (6183.9105) [Time Left: 00:08:20] ht_loss=279.8896 : L1=93.3314 : L2=93.2782 : L3=93.2800
-->[train]: [epoch-000-150][030/247] Time 0.95 (2.04) Data 0.00 (0.82) Forward 0.05 (1.10) Loss_all 279.0271 (4278.9738) [Time Left: 00:07:20] ht_loss=279.0271 : L1=93.0395 : L2=92.9885 : L3=92.9991
-->[train]: [epoch-000-150][040/247] Time 4.48 (2.12) Data 3.52 (0.96) Forward 3.56 (1.19) Loss_all 279.3033 (3303.2060) [Time Left: 00:07:16] ht_loss=279.3033 : L1=93.0816 : L2=93.1021 : L3=93.1196
-->[train]: [epoch-000-150][050/247] Time 0.96 (2.03) Data 0.00 (0.92) Forward 0.05 (1.11) Loss_all 278.0408 (2710.1125) [Time Left: 00:06:38] ht_loss=278.0408 : L1=92.7466 : L2=92.6507 : L3=92.6435
-->[train]: [epoch-000-150][060/247] Time 4.94 (2.13) Data 3.96 (1.04) Forward 4.00 (1.21) Loss_all 279.6303 (2311.4327) [Time Left: 00:06:36] ht_loss=279.6303 : L1=93.1926 : L2=93.2191 : L3=93.2186
-->[train]: [epoch-000-150][070/247] Time 0.98 (2.08) Data 0.00 (1.00) Forward 0.05 (1.16) Loss_all 277.1473 (2025.1064) [Time Left: 00:06:05] ht_loss=277.1473 : L1=92.4281 : L2=92.3588 : L3=92.3605
-->[train]: [epoch-000-150][080/247] Time 4.86 (2.10) Data 3.87 (1.04) Forward 3.91 (1.18) Loss_all 278.3179 (1809.4039) [Time Left: 00:05:48] ht_loss=278.3179 : L1=92.7645 : L2=92.7693 : L3=92.7842
-->[train]: [epoch-000-150][090/247] Time 0.98 (2.08) Data 0.00 (1.03) Forward 0.05 (1.16) Loss_all 277.4024 (1641.1797) [Time Left: 00:05:24] ht_loss=277.4024 : L1=92.4464 : L2=92.4706 : L3=92.4854
-->[train]: [epoch-000-150][100/247] Time 6.01 (2.09) Data 5.07 (1.05) Forward 5.11 (1.17) Loss_all 277.7260 (1506.2978) [Time Left: 00:05:05] ht_loss=277.7260 : L1=92.5351 : L2=92.5909 : L3=92.6000
-->[train]: [epoch-000-150][110/247] Time 0.98 (2.08) Data 0.00 (1.04) Forward 0.05 (1.15) Loss_all 278.5226 (1395.6688) [Time Left: 00:04:42] ht_loss=278.5226 : L1=92.8076 : L2=92.8527 : L3=92.8623
-->[train]: [epoch-000-150][120/247] Time 4.90 (2.08) Data 3.86 (1.05) Forward 3.98 (1.16) Loss_all 277.3715 (1303.3518) [Time Left: 00:04:22] ht_loss=277.3715 : L1=92.4822 : L2=92.4495 : L3=92.4398
-->[train]: [epoch-000-150][130/247] Time 1.01 (2.06) Data 0.00 (1.03) Forward 0.05 (1.14) Loss_all 277.4960 (1225.1186) [Time Left: 00:03:59] ht_loss=277.4960 : L1=92.5309 : L2=92.4777 : L3=92.4875
-->[train]: [epoch-000-150][140/247] Time 4.89 (2.07) Data 3.85 (1.04) Forward 3.89 (1.14) Loss_all 277.4550 (1157.9637) [Time Left: 00:03:39] ht_loss=277.4550 : L1=92.4373 : L2=92.4946 : L3=92.5231
-->[train]: [epoch-000-150][150/247] Time 0.97 (2.07) Data 0.00 (1.04) Forward 0.05 (1.14) Loss_all 277.0612 (1099.6810) [Time Left: 00:03:18] ht_loss=277.0612 : L1=92.3476 : L2=92.3554 : L3=92.3583
-->[train]: [epoch-000-150][160/247] Time 4.87 (2.09) Data 3.86 (1.06) Forward 3.90 (1.16) Loss_all 278.3372 (1048.6711) [Time Left: 00:02:59] ht_loss=278.3372 : L1=92.7852 : L2=92.7728 : L3=92.7792
-->[train]: [epoch-000-150][170/247] Time 1.05 (2.07) Data 0.00 (1.05) Forward 0.05 (1.14) Loss_all 278.7277 (1003.6236) [Time Left: 00:02:37] ht_loss=278.7277 : L1=92.8838 : L2=92.9143 : L3=92.9296
-->[train]: [epoch-000-150][180/247] Time 5.15 (2.08) Data 4.04 (1.06) Forward 4.21 (1.15) Loss_all 278.2527 (963.5538) [Time Left: 00:02:17] ht_loss=278.2527 : L1=92.8155 : L2=92.7081 : L3=92.7291
-->[train]: [epoch-000-150][190/247] Time 1.03 (2.05) Data 0.00 (1.04) Forward 0.05 (1.12) Loss_all 277.6393 (927.6484) [Time Left: 00:01:55] ht_loss=277.6393 : L1=92.5422 : L2=92.5445 : L3=92.5525
-->[train]: [epoch-000-150][200/247] Time 4.82 (2.08) Data 3.81 (1.06) Forward 3.86 (1.15) Loss_all 278.6940 (895.3700) [Time Left: 00:01:35] ht_loss=278.6940 : L1=92.8379 : L2=92.9121 : L3=92.9440
-->[train]: [epoch-000-150][210/247] Time 1.01 (2.07) Data 0.00 (1.05) Forward 0.05 (1.14) Loss_all 278.6920 (866.1021) [Time Left: 00:01:14] ht_loss=278.6920 : L1=92.8799 : L2=92.8969 : L3=92.9152
-->[train]: [epoch-000-150][220/247] Time 4.31 (2.09) Data 3.27 (1.07) Forward 3.32 (1.15) Loss_all 277.6641 (839.5094) [Time Left: 00:00:54] ht_loss=277.6641 : L1=92.5900 : L2=92.5280 : L3=92.5461
-->[train]: [epoch-000-150][230/247] Time 0.98 (2.08) Data 0.00 (1.06) Forward 0.05 (1.14) Loss_all 277.3657 (815.2032) [Time Left: 00:00:33] ht_loss=277.3657 : L1=92.4130 : L2=92.4656 : L3=92.4871
-->[train]: [epoch-000-150][240/247] Time 5.31 (2.09) Data 4.35 (1.07) Forward 4.40 (1.16) Loss_all 276.9100 (792.9332) [Time Left: 00:00:12] ht_loss=276.9100 : L1=92.2631 : L2=92.3141 : L3=92.3328
-->[train]: [epoch-000-150][246/247] Time 2.29 (2.09) Data 0.00 (1.07) Forward 1.03 (1.15) Loss_all 278.2914 (781.8199) [Time Left: 00:00:00] ht_loss=278.2914 : L1=92.7296 : L2=92.7785 : L3=92.7834
Eval dataset length 31528, labeled data length 31528
Compute NME for 31528 images with 68 points :: [(nms): mean=164.630, std=33.857]
==>>[2021-01-04 10:56:13] Train [epoch-000-150] Average Loss = 781.819878, NME = 164.63
save checkpoint into ../snopshots/checkpoint/HEATMAP-epoch-000-150.pth
save checkpoint into ../snopshots/last-info.pth
==>>[2021-01-04 10:56:13] [epoch-001-150], [[Time Left: 21:42:18]], LR : [0.00050 ~ 0.00050], Config : {'epochs': 150, 'num_pts': 68, 'sigma': 4, 'print_freq': 10, 'downsample': 8, 'shape': [256, 256]}
-->[train]: [epoch-001-150][000/247] Time 8.17 (8.17) Data 7.04 (7.04) Forward 7.16 (7.16) Loss_all 278.1276 (278.1276) [Time Left: 00:33:30] ht_loss=278.1276 : L1=92.7402 : L2=92.6863 : L3=92.7011
-->[train]: [epoch-001-150][010/247] Time 1.02 (2.70) Data 0.00 (1.68) Forward 0.05 (1.74) Loss_all 277.0975 (278.3596) [Time Left: 00:10:37] ht_loss=277.0975 : L1=92.3605 : L2=92.3538 : L3=92.3832
-->[train]: [epoch-001-150][020/247] Time 4.71 (2.55) Data 3.67 (1.54) Forward 3.72 (1.59) Loss_all 277.3455 (278.1147) [Time Left: 00:09:37] ht_loss=277.3455 : L1=92.4743 : L2=92.4308 : L3=92.4404
-->[train]: [epoch-001-150][030/247] Time 1.08 (2.31) Data 0.00 (1.29) Forward 0.11 (1.34) Loss_all 278.7683 (278.1484) [Time Left: 00:08:18] ht_loss=278.7683 : L1=92.9498 : L2=92.9005 : L3=92.9180
-->[train]: [epoch-001-150][040/247] Time 7.17 (2.35) Data 6.17 (1.33) Forward 6.22 (1.38) Loss_all 278.2269 (278.1230) [Time Left: 00:08:04] ht_loss=278.2269 : L1=92.6964 : L2=92.7372 : L3=92.7933
-->[train]: [epoch-001-150][050/247] Time 1.03 (2.30) Data 0.00 (1.27) Forward 0.05 (1.32) Loss_all 276.7727 (278.0273) [Time Left: 00:07:29] ht_loss=276.7727 : L1=92.2674 : L2=92.2390 : L3=92.2664
-->[train]: [epoch-001-150][060/247] Time 5.33 (2.29) Data 4.27 (1.26) Forward 4.32 (1.32) Loss_all 277.9707 (277.9931) [Time Left: 00:07:06] ht_loss=277.9707 : L1=92.5334 : L2=92.6936 : L3=92.7436
-->[train]: [epoch-001-150][070/247] Time 1.08 (2.23) Data 0.00 (1.20) Forward 0.07 (1.25) Loss_all 276.6379 (277.9465) [Time Left: 00:06:32] ht_loss=276.6379 : L1=92.0694 : L2=92.2572 : L3=92.3113
-->[train]: [epoch-001-150][080/247] Time 5.25 (2.25) Data 4.22 (1.21) Forward 4.27 (1.26) Loss_all 278.9980 (277.9131) [Time Left: 00:06:12] ht_loss=278.9980 : L1=92.9393 : L2=93.0119 : L3=93.0468
-->[train]: [epoch-001-150][090/247] Time 1.11 (2.20) Data 0.00 (1.16) Forward 0.06 (1.22) Loss_all 278.4267 (277.9751) [Time Left: 00:05:43] ht_loss=278.4267 : L1=92.7253 : L2=92.8161 : L3=92.8853
-->[train]: [epoch-001-150][100/247] Time 3.97 (2.22) Data 2.93 (1.18) Forward 2.98 (1.23) Loss_all 277.6642 (277.8997) [Time Left: 00:05:23] ht_loss=277.6642 : L1=92.5115 : L2=92.5662 : L3=92.5865
-->[train]: [epoch-001-150][110/247] Time 1.00 (2.21) Data 0.00 (1.17) Forward 0.04 (1.22) Loss_all 278.4783 (277.8227) [Time Left: 00:05:00] ht_loss=278.4783 : L1=92.5632 : L2=92.6771 : L3=93.2380
-->[train]: [epoch-001-150][120/247] Time 1.53 (2.18) Data 0.52 (1.14) Forward 0.56 (1.20) Loss_all 278.9060 (277.7973) [Time Left: 00:04:34] ht_loss=278.9060 : L1=92.9879 : L2=92.9261 : L3=92.9920
-->[train]: [epoch-001-150][130/247] Time 1.04 (2.20) Data 0.00 (1.16) Forward 0.07 (1.22) Loss_all 275.1124 (277.7316) [Time Left: 00:04:15] ht_loss=275.1124 : L1=91.6005 : L2=91.7149 : L3=91.7970
-->[train]: [epoch-001-150][140/247] Time 1.36 (2.19) Data 0.35 (1.15) Forward 0.39 (1.21) Loss_all 277.0453 (277.6980) [Time Left: 00:03:52] ht_loss=277.0453 : L1=92.2058 : L2=92.3897 : L3=92.4498
-->[train]: [epoch-001-150][150/247] Time 1.05 (2.20) Data 0.00 (1.16) Forward 0.05 (1.22) Loss_all 277.4876 (277.6891) [Time Left: 00:03:31] ht_loss=277.4876 : L1=92.4398 : L2=92.4923 : L3=92.5554
-->[train]: [epoch-001-150][160/247] Time 1.01 (2.19) Data 0.00 (1.15) Forward 0.06 (1.20) Loss_all 276.7957 (277.6487) [Time Left: 00:03:08] ht_loss=276.7957 : L1=92.1740 : L2=92.2719 : L3=92.3497
-->[train]: [epoch-001-150][170/247] Time 1.07 (2.21) Data 0.00 (1.17) Forward 0.05 (1.23) Loss_all 274.4813 (277.5769) [Time Left: 00:02:48] ht_loss=274.4813 : L1=91.2472 : L2=91.5625 : L3=91.6715
-->[train]: [epoch-001-150][180/247] Time 4.76 (2.20) Data 3.73 (1.16) Forward 3.78 (1.22) Loss_all 277.1411 (277.5542) [Time Left: 00:02:25] ht_loss=277.1411 : L1=92.2720 : L2=92.3849 : L3=92.4842
-->[train]: [epoch-001-150][190/247] Time 1.03 (2.19) Data 0.00 (1.15) Forward 0.06 (1.20) Loss_all 276.4174 (277.5101) [Time Left: 00:02:02] ht_loss=276.4174 : L1=91.9825 : L2=92.1831 : L3=92.2517
-->[train]: [epoch-001-150][200/247] Time 1.08 (2.18) Data 0.00 (1.14) Forward 0.08 (1.20) Loss_all 276.7092 (277.4698) [Time Left: 00:01:40] ht_loss=276.7092 : L1=92.0072 : L2=92.3023 : L3=92.3998
-->[train]: [epoch-001-150][210/247] Time 1.01 (2.19) Data 0.00 (1.15) Forward 0.05 (1.21) Loss_all 277.3184 (277.4343) [Time Left: 00:01:18] ht_loss=277.3184 : L1=92.2769 : L2=92.4722 : L3=92.5693
-->[train]: [epoch-001-150][220/247] Time 2.94 (2.19) Data 1.92 (1.15) Forward 1.96 (1.20) Loss_all 275.9615 (277.3970) [Time Left: 00:00:56] ht_loss=275.9615 : L1=91.9907 : L2=91.9326 : L3=92.0382
-->[train]: [epoch-001-150][230/247] Time 1.04 (2.19) Data 0.00 (1.15) Forward 0.07 (1.20) Loss_all 276.7830 (277.3277) [Time Left: 00:00:35] ht_loss=276.7830 : L1=91.9892 : L2=92.3281 : L3=92.4656
-->[train]: [epoch-001-150][240/247] Time 4.19 (2.20) Data 3.19 (1.16) Forward 3.24 (1.22) Loss_all 275.8693 (277.2501) [Time Left: 00:00:13] ht_loss=275.8693 : L1=91.6268 : L2=92.0341 : L3=92.2084
-->[train]: [epoch-001-150][246/247] Time 0.37 (2.19) Data 0.00 (1.15) Forward 0.04 (1.21) Loss_all 277.2675 (277.2426) [Time Left: 00:00:00] ht_loss=277.2675 : L1=92.4372 : L2=92.3524 : L3=92.4779
Eval dataset length 31528, labeled data length 31528
Compute NME for 31528 images with 68 points :: [(nms): mean=165.022, std=33.734]
==>>[2021-01-04 11:05:24] Train [epoch-001-150] Average Loss = 277.242612, NME = 165.02
save checkpoint into ../snopshots/checkpoint/HEATMAP-epoch-001-150.pth
save checkpoint into ../snopshots/last-info.pth
Basic-Eval-All evaluates 1 dataset
==>>[2021-01-04 11:05:24], [epoch-001-150], evaluate the 0/1-th dataset [image] : GeneralDataset(point-num=68, shape=[256, 256], sigma=4, heatmap_type=gaussian, length=689, cutout=0, dataset=test_300w)
-->[test]: [epoch-001-150][000/006] Time 6.60 (6.60) Data 6.19 (6.19) Forward 6.22 (6.22) Loss_all 280.0911 (280.0911) [Time Left: 00:00:33] ht_loss=280.0911 : L1=93.1269 : L2=93.3953 : L3=93.5689
-->[test]: [epoch-001-150][005/006] Time 1.23 (1.57) Data 0.00 (1.03) Forward 1.14 (1.25) Loss_all 280.1683 (279.7528) [Time Left: 00:00:00] ht_loss=280.1683 : L1=93.1425 : L2=93.4033 : L3=93.6225
Eval dataset length 689, labeled data length 689
Compute NME for 689 images with 68 points :: [(nms): mean=166.901, std=24.249]
NME Results :
->test_300w : NME = 166.901,
The text was updated successfully, but these errors were encountered: