Skip to content

Challenging label noise called BadLabel; Robust label-noise learning called Robust DivideMix

Notifications You must be signed in to change notification settings

zjfheart/BadLabels

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BadLabel

The code repository for a challenging label noise called BadLabel and a robust label-noise learning (LNL) algorithm called Robust DivideMix.
The paper link: https://arxiv.org/abs/2305.18377

This paper is accepted at IEEE TPAMI 2024.
DOI: 10.1109/TPAMI.2024.3355425

Prerequisites

Python (3.8)
Pytorch (1.8.0)
CUDA
Numpy

Usage

Part 1: Generate BadLabel

You can run the following command to synthesize BadLabel.

cd gen_badlabels
./gen_badlabels.sh

Part 2: Evaluate BadLabel

Run the following command to automatically evaluate BadLabel on multiple LNL algorithms. Additionally, in this shell script, we provide the execution commands for each LNL algorithm and the hyperparameter settings we used. Based on this, you can also evaluate a specific LNL algorithm separately.

cd eval_badlabels
./eval_badlabels.sh

We have shared various label noises generated by us under the eval_badlabels/noise directory for quick experimental verification.

If you want to quickly evaluate BadLabel on your own algorithm, we also provide MNIST, CIFAR-10 and CIFAR-100 training sets with injected BadLabel in Google Drive. You can easily load the datasets using load_badlabels_dataset.py under the eval_badlabels directory.

Evaluation results of BadLabel on CIFAR-10, CIFAR-100 and MNIST

Here we share our evaluation results on CIFAR-10, CIFAR-100, and MNIST.

We evaluated using Standard Training (no defense) [paper] and 11 state-of-the-art LNL methods as baselines. Specifically, these methods are as follows: Co-teaching [paper, code], T-Revision [paper, code], RoG [paper, code], DivideMix [paper, code], AdaCorr [paper, code], Peer Loss [paper, code], ELR [paper, code], Negative LS [paper, code], PGDF [paper, code], ProMix [paper, code], SOP [paper, code].

TABLE 1: Test accuracy (%) on CIFAR-10 with different types of label noise (symmetric, asymmetric, instance-dependent, and our proposed BadLabel) and noise levels (ranging from 20% to 80%). The most robust evaluations for each LNL method are highlighted in bold.
Method Noise type / Noise ratio
Sym. Asym. IDN BadLabel
20%40%60%80%20%40%20%40%60%80%20%40%60%80%
Standard Training Best 85.2179.9069.7943.0088.0285.2285.4278.9368.9755.3476.76±1.0858.79±1.4939.64±1.1317.80±0.91
Last 82.5564.7941.4317.2087.2877.0485.2374.0652.2228.0475.31±0.2455.72±0.1735.66±0.2313.44±0.26
Co-teaching Best 89.1984.8058.2521.7690.6563.1185.7273.4245.8433.4380.41±0.7856.81±3.8614.42±1.2210.51±0.71
Last 89.0384.6557.9521.0690.5256.3385.4872.9745.5325.2779.48±0.7555.54±3.7412.99±1.094.24±2.44
T-Revision Best 89.7986.8378.1464.5491.2389.6085.7478.4569.3156.2676.99±1.3857.21±1.6436.01±1.1014.93±0.50
Last 89.5986.5776.8560.5491.0989.4085.4369.1858.1533.1575.71±1.6855.02±1.3433.99±0.2913.16±0.68
RoG Best --------------
Last 87.4874.8152.4216.0289.6181.6385.3476.6863.7937.1185.88±0.3264.20±0.9135.89±1.348.64±0.76
DivideMix Best 96.2195.0894.8081.9594.8294.2091.9785.8481.5959.0684.81±0.7858.44±1.4528.38±0.566.87±0.59
Last 96.0494.7494.5681.5894.4693.5090.7782.9481.1947.8182.13±0.7857.65±1.9616.21±1.246.12±0.45
AdaCorr Best 90.6687.1780.9735.9792.3588.6085.8879.5469.3655.8676.97±0.8357.17±0.7137.14±0.3814.72±0.86
Last 90.4686.7880.6635.6792.1788.3485.7079.0559.1330.4874.71±0.2654.92±0.2234.71±0.2211.94±0.12
Peer Loss Best 90.8787.1379.0361.9191.4787.5086.4681.0769.8755.5175.28±1.4355.75±1.3936.17±0.2315.87±0.30
Last 90.6586.8578.8361.4391.1181.2485.7274.4354.5733.7674.00±1.4353.73±1.2534.37±0.6814.71±0.22
ELR Best 92.8591.3087.9954.6792.4289.4087.6282.0873.2357.2685.73±0.1562.58±1.3335.24±1.1211.71±0.70
Last 89.3787.7885.6946.7192.3189.1185.3178.0568.1248.9981.88±0.2556.45±0.3130.45±0.308.67±0.79
Negative LS Best 87.4284.4075.2243.6288.3485.0389.8283.6675.7664.2178.77±0.6657.68±0.8936.57±0.8816.46±0.82
Last 87.3084.2175.0743.5065.2347.2281.8782.1070.9545.6273.99±0.9052.45±1.0326.66±0.813.21±0.44
PGDF Best 96.6396.1295.0580.6996.0589.8791.8185.7576.8459.6082.72±0.4761.50±1.8734.46±1.446.37±0.34
Last 96.4095.9594.7579.7695.7488.4591.3084.3169.5434.8179.95±0.3656.26±1.0330.14±0.854.56±0.45
ProMix Best 97.4096.9890.8061.1597.0496.0994.7291.3276.2254.0194.95±1.4348.36±1.7224.87±1.479.51±1.51
Last 97.3096.9190.7252.2596.9496.0394.6391.0175.1245.8094.59±1.6444.08±0.4921.33±0.467.93±1.34
SOP Best 96.1795.6494.8389.9495.9693.6090.3283.2671.5457.1484.96±0.3566.25±1.3542.59±1.2512.70±0.89
Last 96.1295.4694.7189.7895.8693.3090.1382.9163.1429.8682.64±0.2761.89±0.2536.51±0.268.63±0.17
TABLE 2: Test accuracy (%) on CIFAR-100 with different types of label noise (symmetric, instance-dependent, and our proposed BadLabel) and noise levels (ranging from 20% to 80%). The most robust evaluations for each LNL method are highlighted in bold.
Method Noise type / Noise ratio
Sym. IDN BadLabel
20%40%60%80%20%40%60%80%20%40%60%80%
Standard Training Best 61.4151.2138.8219.8970.0662.4853.2145.7756.75±0.9835.42±0.7717.70±1.026.03±0.24
Last 61.1746.2727.019.2769.9462.3252.5540.4556.30±0.1334.90±0.1717.05±0.284.18±0.16
Co-teaching Best 62.8055.0234.667.7266.1657.5545.3823.8354.30±0.7826.02±2.133.97±0.110.99±0.21
Last 62.3554.8433.446.7866.0257.3345.2423.7253.97±0.7125.74±1.213.67±0.140.00±0.00
T-Revision Best 65.1960.4343.014.0368.7762.8654.2345.6757.86±1.0240.60±1.3313.06±1.201.92±0.56
Last 64.9560.2642.773.1268.5362.3953.0741.8557.26±1.5438.40±0.9612.65±0.581.43±0.95
RoG Best ------------
Last 66.6860.7953.0822.7366.3960.8056.0048.6270.55±0.5558.61±0.6525.74±0.284.13±0.41
DivideMix Best 77.3675.0272.2557.5672.7967.8261.0851.5065.55±0.6542.72±0.4419.17±1.284.67±0.87
Last 76.8774.6671.9157.0872.5067.3760.5547.8664.96±0.4740.92±0.3613.04±0.851.10±0.21
AdaCorr Best 66.3159.7847.2224.1568.8962.6354.9145.2256.22±0.8235.38±1.2716.87±1.364.81±0.22
Last 66.0359.4847.0423.9068.7262.4554.6841.9555.69±0.4433.88±0.8814.88±0.523.76±1.24
Peer Loss Best 61.9751.0939.9818.8269.6363.3255.0146.2055.58±1.7937.11±2.0119.53±1.296.42±0.52
Last 60.6443.6426.237.6569.3862.7053.9042.1455.00±1.4135.85±1.4818.65±0.225.74±0.76
ELR Best 72.5568.7560.0126.8970.2766.0460.5952.8168.21±0.6243.75±0.2114.39±0.351.09±0.18
Last 72.1368.6059.7823.9570.1365.8760.4152.5767.97±0.1743.40±0.2213.97±0.380.98±0.11
Negative LS Best 63.6557.1744.1821.3169.2062.6754.4946.9657.76±0.5636.80±0.2117.96±0.315.88±0.11
Last 63.5456.9843.9821.1963.3855.7242.8724.6956.42±0.7133.38±0.2211.42±0.381.28±0.14
PGDF Best 81.9078.5074.0552.4875.8771.7262.7653.1669.44±0.2646.39±0.3919.05±0.375.08±0.13
Last 81.3778.2173.6452.1174.9071.3262.0651.6868.18±0.1645.38±0.1516.84±0.240.72±0.25
ProMix Best 79.9980.2171.4444.9776.6171.9266.0451.9669.80±1.5837.73±1.0915.92±1.884.62±0.95
Last 79.7779.9571.2544.6476.4471.6665.9451.7769.68±0.9937.24±0.8414.88±1.023.42±0.22
SOP Best 77.3575.2072.3963.1372.5263.8456.7950.2065.80±0.6845.61±0.3422.68±0.272.88±0.11
Last 77.1174.8972.1062.8772.1163.1553.3540.7765.51±0.1245.24±0.2621.55±0.182.48±0.16
TABLE 3: Test accuracy (%) on MNIST with different types of label noise (symmetric, instance-dependent, and our proposed BadLabel) and noise levels (ranging from 20% to 80%). The most robust evaluations for each LNL method are highlighted in bold.
Method Noise type / Noise ratio
Sym. IDN BadLabel
20%40%60%80%20%40%60%80%20%40%60%80%
Standard Training Best 98.6897.4797.0577.6593.2777.0853.7834.4987.7574.3745.6623.87
Last 94.2980.3251.7822.2987.7270.8647.7023.5582.5361.3139.0115.93
Co-teaching Best 99.1998.9698.7377.3093.9183.8463.2630.0790.0467.4442.8811.59
Last 97.2894.8892.0970.1091.9274.4057.7328.0587.3760.0111.3310.13
T-Revision Best 99.2499.0698.5696.2490.9078.8258.5811.4985.3469.2745.4821.83
Last 99.1599.0298.4496.1487.7469.9246.1711.3581.9960.2438.2616.48
RoG Best ------------
Last 95.8783.0856.6521.8088.9271.8053.7225.8085.6265.9840.5818.12
DivideMix Best 99.5399.4098.5288.0595.7482.6154.1128.0585.6364.7644.7721.18
Last 98.7996.2391.9061.7988.9068.1743.7021.1783.3462.0442.3919.70
AdaCorr Best 99.0199.0198.3493.7092.2279.4653.1428.0484.6864.8642.7620.92
Last 93.2777.2449.8923.3787.3367.7144.9822.5380.5359.8738.3417.78
Peer Loss Best 99.1098.9598.1993.8192.3485.4358.2247.3488.1167.3445.8724.05
Last 92.8576.9250.9821.8287.2165.2044.6221.8480.4959.6238.8518.87
Negative LS Best 99.1498.7997.9085.9893.9082.8455.7431.7888.0469.9547.8022.60
Last 99.0098.7397.8685.9283.5677.7049.7323.7510.8725.8027.0310.32
ProMix Best 99.7599.7798.0785.5099.1496.1269.8841.2199.6669.3542.8028.95
Last 99.6799.7497.7665.2197.3792.7461.0930.3599.5666.3335.8019.09
SOP Best 99.2198.5697.7686.3092.6877.3758.0029.2191.0067.6048.8128.57
Last 98.6594.0565.0324.4891.3975.9753.2926.8884.6661.7837.0713.95

Learning curves of multiple LNL algorithms on CIFAR-10 and CIFAR-100

Here we present learning curves of multiple LNL algorithms on CIFAR-10 and CIFAR-100 datasets with different types and ratios of label noise.

FIGURE 1: Learning curves of multiple LNL algorithms on CIFAR-10.

 

FIGURE 2: Learning curves of multiple LNL algorithms on CIFAR-100.

Part 3: Evaluate Robust DivideMix

Run the following command to evaluate Robust DivideMix on different datasets.

cd robust_LNL_algo
./eval_robust_dividemix.sh

Evaluation results of Robust DivideMix on CIFAR-10 and CIFAR-100

Here we share our evaluation results of Robust DivideMix and two baseline methods on CIFAR-10 and CIFAR-100 datasets with multiple types of noise.

TABLE 4: Comparison of the test accuracy (%) between Robust DivideMix and baseline methods on CIFAR-10 with different types and ratios of label noise. The best average performance under each noise ratio is highlighted in bold.
Noise type Method / Noise ratio
Standard Training DivideMix Robust DivideMix
20%40%60%80%20%40%60%80%20%40%60%80%
Sym. Best 85.2179.9069.7943.0096.2195.0894.8081.9595.45±0.3694.84±0.1394.25±0.1161.59±1.24
Last 82.5564.7941.4317.2096.0494.7494.5681.5895.28±0.3894.71±0.1694.11±0.1260.98±1.21
Asym. Best 88.0285.22--94.8294.20--91.77±0.4686.88±0.82--
Last 87.2877.04--94.4693.50--90.62±0.3884.02±1.65--
IDN Best 85.4278.9368.9755.3491.9785.8481.5959.0690.44±1.0989.71±0.7478.12±0.3160.64±0.46
Last 85.2374.0652.2228.0490.7782.9481.1947.8187.30±1.7289.16±0.6972.33±1.0850.38±0.68
BadLabel Best 76.7658.7939.6417.8084.8158.4428.386.8792.07±1.0686.70±3.8376.47±3.8927.41±3.25
Last 75.3155.7235.6613.4482.1357.6516.216.1291.76±1.2785.96±4.3373.29±3.8125.20±2.72
Average Best 83.8575.7159.4738.7191.9583.3968.2649.1792.43±0.7489.53±1.3882.95±1.4349.88±1.65
Last 82.5967.9043.1019.5690.8582.2163.9945.1791.24±0.9388.46±1.7179.91±1.6745.52±1.54
TABLE 5: Comparison of the test accuracy (%) between Robust DivideMix and baseline methods on CIFAR-100 with different types and ratios of label noise. The best average performance under each noise ratio is highlighted in bold.
Noise type Method / Noise ratio
Standard Training DivideMix Robust DivideMix
20%40%60%80%20%40%60%80%20%40%60%80%
Sym. Best 61.4151.2138.8219.8977.3675.0272.2557.5677.35±0.2874.40±0.2070.74±0.4548.13±0.80
Last 61.1746.2727.019.2776.8774.6671.9157.0877.06±0.2874.16±0.2369.93±0.5947.84±0.82
IDN Best 70.0662.4853.2145.7772.7967.8261.0851.5073.49±0.2869.47±0.1863.64±0.2152.74±0.73
Last 69.9462.3252.5540.4572.5067.3760.5547.8673.10±0.2068.88±0.1361.03±0.3146.84±0.17
BadLabel Best 56.7535.4217.706.0365.5542.7219.174.6765.29±0.7646.64±0.4841.80±1.1921.48±0.39
Last 56.3034.9017.054.1864.9640.9213.041.1064.49±0.9645.26±0.4035.91±0.6716.91±0.41
Average Best 62.7449.7036.5823.9071.9061.8550.8337.9172.04±0.4463.50±0.2958.73±0.6240.78±0.64
Last 62.4747.8332.2017.9771.4460.9848.5035.3571.55±0.4862.77±0.2555.62±0.5237.20±0.47

Learning curves of multiple LNL algorithms on different BadLabel noise ratios

Below, we present the learning curves of multiple LNL algorithms on CIFAR-10 and CIFAR-100 with different BadLabel noise ratios.

FIGURE 3: Learning curves of multiple LNL algorithms on CIFAR-10 with different BadLabel noise ratios.

 

FIGURE 4: Learning curves of multiple LNL algorithms on CIFAR-100 with different BadLabel noise ratios.

About

Challenging label noise called BadLabel; Robust label-noise learning called Robust DivideMix

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published