Dataloader for CIFAR-N (PyTorch)

[Update 5/17/2023] A demo for automatically detecting label errors on CIFAR-N is availabel at Docta now!

Docta: A Doctor for your data
An advanced data-centric AI platform that offers a comprehensive range of services aimed at detecting and rectifying issues in your data.

This repository is the official dataset release and Pytorch implementation of "Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations" accepted by ICLR2022. We collected and published re-annotated versions of the CIFAR-10 and CIFAR-100 data which contains real-world human annotation errors. We show how these noise patterns deviate from the classically assumed ones and what the new challenges are. The website of CIFAR-N is available at http://www.noisylabels.com/.

Competition: Please refer to the branch ijcai-lmnl-2022 for details of 1st Learning with Noisy Labels Challenge in IJCAI 2022. Also available at http://competition.noisylabels.com/.

Dataloader for CIFAR-N (PyTorch)

CIFAR-10N

import torch
noise_file = torch.load('./data/CIFAR-10_human.pt')
clean_label = noise_file['clean_label']
worst_label = noise_file['worse_label']
aggre_label = noise_file['aggre_label']
random_label1 = noise_file['random_label1']
random_label2 = noise_file['random_label2']
random_label3 = noise_file['random_label3']

CIFAR-100N

import torch
noise_file = torch.load('./data/CIFAR-100_human.pt')
clean_label = noise_file['clean_label']
noisy_label = noise_file['noisy_label']

Dataloader for CIFAR-N (Tensorflow)

Note: Image order of tensorflow dataset (tfds.load, binary version of CIFAR) does not match with PyTorch dataloader (python version of CIFAR).

CIFAR-10N

import numpy as np
noise_file = np.load('./data/CIFAR-10_human_ordered.npy', allow_pickle=True)
clean_label = noise_file.item().get('clean_label')
worst_label = noise_file.item().get('worse_label')
aggre_label = noise_file.item().get('aggre_label')
random_label1 = noise_file.item().get('random_label1')
random_label2 = noise_file.item().get('random_label2')
random_label3 = noise_file.item().get('random_label3')
# The noisy label matches with following tensorflow dataloader
train_ds, test_ds = tfds.load('cifar10', split=['train','test'], as_supervised=True, batch_size = -1)
train_images, train_labels = tfds.as_numpy(train_ds) 
# You may want to replace train_labels by CIFAR-N noisy label sets

Reminder: CIFAR-10N is now available at tensorflow datasets. Please check here for more details!

CIFAR-100N

import numpy as np
noise_file = np.load('./data/CIFAR-100_human_ordered.npy', allow_pickle=True)
clean_label = noise_file.item().get('clean_label')
noise_label = noise_file.item().get('noise_label')
# The noisy label matches with following tensorflow dataloader
train_ds, test_ds = tfds.load('cifar100', split=['train','test'], as_supervised=True, batch_size = -1)
train_images, train_labels = tfds.as_numpy(train_ds) 
# You may want to replace train_labels by CIFAR-N noisy label sets

The image order from tfds to pytorch dataloader is given below:

image_order_c10.npy: a numpy array with length 50K, the i-th element denotes the index of i-th unshuffled tfds (binary-version) CIFAR-10 training image in the Pytorch (python-version) ones.
image_order_c100.npy: a numpy array with length 50K, the i-th element denotes the index of i-th unshuffled tfds (binary-version) CIFAR-100 training image in the Pytorch (python-version) ones.

Training on CIFAR-N with Cross-Entropy (PyTorch)

CIFAR-10N

# NOISE_TYPE: [clean, aggre, worst, rand1, rand2, rand3]
# Use human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar10 --noise_type NOISE_TYPE --is_human
# Use the synthetic noise that has the same noise transition matrix as human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar10 --noise_type NOISE_TYPE

CIFAR-100N

# NOISE_TYPE: [clean100, noisy100]
# Use human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar100 --noise_type NOISE_TYPE --is_human
# Use the synthetic noise that has the same noise transition matrix as human annotations
CUDA_VISIBLE_DEVICES=0 python3 main.py --dataset cifar100 --noise_type NOISE_TYPE

Additional dataset information

We include additional side information during the noisy-label collection in side_info_cifar10N.csv and side_info_cifar100N.csv. A brief introduction of these two files:

Image-batch: a subset of indexes of the CIFAR training images.
Worker-id: the encrypted worker id on Amazon Mechanical Turk.
Work-time-in-seconds: the time (in seconds) a worker spent on annotating the corresponding image batch.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
data		data
models		models
.gitattributes		.gitattributes
LICENSE.md		LICENSE.md
README.md		README.md
fine2coarse.py		fine2coarse.py
image_order_c10.npy		image_order_c10.npy
image_order_c100.npy		image_order_c100.npy
loss.py		loss.py
main.py		main.py
side_info_cifar100N.csv		side_info_cifar100N.csv
side_info_cifar10N.csv		side_info_cifar10N.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataloader for CIFAR-N (PyTorch)

CIFAR-10N

CIFAR-100N

Dataloader for CIFAR-N (Tensorflow)

CIFAR-10N

CIFAR-100N

Training on CIFAR-N with Cross-Entropy (PyTorch)

CIFAR-10N

CIFAR-100N

Additional dataset information

About

Releases

Packages

Contributors 2

Languages

License

UCSC-REAL/cifar-10-100n

Folders and files

Latest commit

History

Repository files navigation

Dataloader for CIFAR-N (PyTorch)

CIFAR-10N

CIFAR-100N

Dataloader for CIFAR-N (Tensorflow)

CIFAR-10N

CIFAR-100N

Training on CIFAR-N with Cross-Entropy (PyTorch)

CIFAR-10N

CIFAR-100N

Additional dataset information

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages