Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COCO sample previews show multiple sample masks #26

Open
Mahi-Mai opened this issue Oct 16, 2019 · 2 comments
Open

COCO sample previews show multiple sample masks #26

Mahi-Mai opened this issue Oct 16, 2019 · 2 comments

Comments

@Mahi-Mai
Copy link

Mahi-Mai commented Oct 16, 2019

I'm not sure what the deal is, but I set up my dataset as described in your guide. The json file generates fine, but when I generate several random previews, some of my samples have some technicolor nonsense going on.

By that I mean 11 masks (they happen to be consecutive masks in the index) are drawn/assigned over the one sample. I've checked the json file, and it looks like all these annotations have been assigned to the one sample for some reason. Each sample is named according to the suggested format, so I'm not sure what's up.

For example:

image filename: DSC_5409_1.jpg
annotation filename: DSC_5409_1_mask_1.png

image: MVI_0155_1107_140.jpg
annotation: MVI_0155_1107_140_mask_1.png

There's the same number of unique filenames under each directory, and I've stripped the extensions and "_mask_1" from the annotation filenames to ensure they match.

Anyone else have this problem?

===

To explain further, I have a source image, let's say MVI_0155_1107.jpg. This is sampled using a sliding window technique to produce x number of samples saved as MVI_0155_1107_#.jpg, with x going as high as 350 in some cases.

...ah, that's the problem. When it reaches images xx_3.jpg, and starts looking for annotations, it's going to pick up ANY annotations with 'xx_3' in the filename. So, if I have annotations with: xx_3, xx_30, xx_31, xx_32, xx_33, xx_34, xx_35, xx_36, xx_37, xx_38, and xx_39, that leaves me with 11 annotations assigned to one sample.

So it's a matter of improving how the script parses annotation filenames for comparison. I'll try something like this...

#  I'm going to index out what I need.  To do that...

words_to_strip = annotation_filename.split('_')[-2:]  # in our case we'll get ['mask', '1.png']

char_sum = 0
for word in words_to_strip:
    char_sum += len(word)

# We still need to add 2 to account for the underscore before and after 'mask'.

char_sum +=2

# Use the sum to index out the part of the filename we want to compare

annotation_filename_match = annotation_filename[:-char_sum]

Now we can use the image filename to match to the appropriate annotation regardless of naming scheme.

There are probably better ways to do this, so please feel free to suggest! I'm not closing this yet because I haven't tested my method. :)

@Mahi-Mai
Copy link
Author

Mahi-Mai commented Oct 18, 2019

So to implement my solution, I used a function for listing files in a directory, and I wrote a small function for stripping the annotation filename:

def strip_annotation_filename(file):
    # Capture filename without extension
    annot_filename = file.split('/')[-1][:-4]

    # Capture words to strip
    words_to_strip = annot_filename.split('_')[-2:]  # in our case we'll get ['mask', '1']

    # Get number of characters
    char_sum = 2  # We still need to add 2 to account for the underscore before and after 'mask'.
    for word in words_to_strip:
        char_sum += len(word)

    # Use the sum to index out the part of the filename we want to compare
    annotation_filename_match = annot_filename[:-char_sum]

    return annotation_filename_match

This results in my new base code looking like this:

INFO = {
    "description": "Corn cobs on conveyor belt.",
    "url": "https://github.com/waspinator/pycococreator",
    "version": "0.1.0",
    "year": 2019,
    "contributor": "Leanne Canessa",
    "date_created": datetime.datetime.utcnow().isoformat(' ')
}

LICENSES = [
    {
        "id": 1,
        "name": "PROPRIETARY",
        "url": "PROPRIETARY"
    }
]

CATEGORIES = [
    {
        'id': 1,
        'name': 'mask',
        'supercategory': 'vegetable',
    }
]

coco_output = {
    "info": INFO,
    "licenses": LICENSES,
    "categories": CATEGORIES,
    "images": [],
    "annotations": []
}

image_id = 1
segmentation_id = 1

image_files = list_files('...validation/cob_detector2019')
annotation_files = list_files('...validation/annotations')

# go through each image
for image_filename in image_files:
    image = Image.open(image_filename)
    image_info = pycococreatortools.create_image_info(image_id, os.path.basename(image_filename), image.size)

    coco_output["images"].append(image_info)

    # gather each associated annotation
    image_basename = image_filename.split('/')[-1][:-4]  # grab image filename without extension
    for annotation_filename in annotation_files:
        # grab annotation filename without category, ID descriptors, and extension
        annotation_basename = strip_annotation_filename(annotation_filename)
        
        # if they match, gather info
        if annotation_basename == image_basename:
            class_id = [x['id'] for x in CATEGORIES if x['name'] in annotation_filename][0]
            category_info = {'id': class_id, 'is_crowd': 'crowd' in image_filename}
            binary_mask = np.asarray(Image.open(annotation_filename)
                .convert('1')).astype(np.uint8)

            annotation_info = pycococreatortools.create_annotation_info(
                segmentation_id, image_id, category_info, binary_mask,
                image.size, tolerance=2)

            if annotation_info is not None:
                coco_output["annotations"].append(annotation_info)

            segmentation_id = segmentation_id + 1

    image_id = image_id + 1

with open('.../validation/instances_validation2019.json', 'w') as output_json_file:
    json.dump(coco_output, output_json_file)
    
print('Done!')

However, now I have a new problem that may be related. When I look at my resulting json files, I have fewer annotations than I do images, despite confirming that each image has a matching annotation file (see attachment).

s

@waspinator Do you think this is a naming/file matching thing again? Or is this an expected behavior I'm not quite getting?

Thanks!

@anmspro
Copy link

anmspro commented Jul 1, 2020

@Mahi-Mai, Hey, can you please take a look at this issue - #34
I need a little help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants