Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Killed" during dataset convert from COCO to YOLO format #267

Closed
1 of 2 tasks
DovydasPociusDroneTeam opened this issue Aug 4, 2023 · 19 comments
Closed
1 of 2 tasks
Assignees
Labels
bug Something isn't working

Comments

@DovydasPociusDroneTeam
Copy link

Search before asking

  • I have searched the Supervision issues and found no similar bug report.

Bug

getting "Killed" error while converting dataset from coco to yolo (the code is given bellow):

Screenshot from 2023-08-04 16-02-16

i tried to split manually big dataset in smaller parts (3 parts) and then didn't get error, but in .YAML file i got different classes positions in "names" part

names: [truck, car, medium car, bus, motorcycle]

and

names: [bus, car, medium car, truck, motorcycle]

any suggestions? Thank you in advance!

Environment

  • python 3.9.13
  • Ubuntu 20.04
  • supervision 0.12.0

Minimal Reproducible Example

import supervision as sv

ds = sv.DetectionDataset.from_coco(
    images_directory_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/all_dataset',
    annotations_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/all_dataset.json',
    force_masks=True
)

train_ds, test_ds = ds.split(split_ratio=0.8, random_state=42, shuffle=True)

train_ds.as_yolo(
    images_directory_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/all_yolo/train/images',
    annotations_directory_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/all_yolo/train_/labels',
    data_yaml_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/train_copy_parts/data_train.yaml'

)

test_ds.as_yolo(
    images_directory_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/all_yolo/test/images',
    annotations_directory_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/all_yolo/test/labels',
    data_yaml_path='/home/droneteam/detectron2_for_labeling/codes_for_testing_seg_model/500_datasetas_pirmam_mokymui/train_copy_parts/data_test.yaml'

)

Additional

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@DovydasPociusDroneTeam DovydasPociusDroneTeam added the bug Something isn't working label Aug 4, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Aug 4, 2023

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

@capjamesg
Copy link
Collaborator

How many images are in your dataset? How large are the images?

This error is the Out of Memory (OOM) killer on your machine acting to ensure the Python process doesn't take up too much RAM and cause instability on your system. This suggests your system isn't able to store all of the images in your dataset in memory, which is required to convert the datasets.

@DovydasPociusDroneTeam
Copy link
Author

DovydasPociusDroneTeam commented Aug 4, 2023

i have about 6000 images (1024x1024). So if it is OOM problem, any suggestions how can i get around this problem?

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 4, 2023

Hi, @DovydasPociusDroneTeam 👋🏻 This is interesting. So the script died, but the output datasets got saved anyway? Would love to learn more.

@hardikdava
Copy link
Collaborator

Hi @DovydasPociusDroneTeam 👋 , we can dig deeper into the process to check for memory leakage but that will take some time.

But answer to another quesiton about class names are rearranged, I might have an idea. @SkalskiP this is due to sorting class based on alphabetic order check here.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 4, 2023

@hardikdava I'm not sure. We got the input dataset. We divided that dataset into two parts. Saved both parts in YOLO format. Both Output datasets have different class orders. Do I understand the problem correctly?

Is the order different between input and output datasets? Or between both output datasets?

@hardikdava It is somehow a related topic. I think in the future, we should migrate sv.DetectionDataset.classes to be the Dict[int, str], not List[str]. We get more and more trouble with braking indexes.

@hardikdava
Copy link
Collaborator

@SkalskiP The changing of sv.DetectionDataset.classes into Dict[int, str] was already in my mind. We should definately do it.

@SkalskiP SkalskiP self-assigned this Aug 4, 2023
@DovydasPociusDroneTeam
Copy link
Author

DovydasPociusDroneTeam commented Aug 4, 2023

Hi, @DovydasPociusDroneTeam 👋🏻 This is interesting. So the script died, but the output datasets got saved anyway? Would love to learn more.

i didn't get output from one 6000 images dataset.

So i tried this dataset split in to 3 separates datasets:
instead of having

train_dataset_images (6000 images) 
└ ├ coco.json    
  ├ image1
  ├ image2
  ├ image2
  └ imageN

i did

train_dataset_images_part1 (2000 images)     
└  ├ coco_part1.json    
   ├ image1
   ├ image2
   ├ image3
   └ imageM

train_dataset_images_part2 (2000 images)     
└   ├ coco_part2.json    
    ├ imageM+1
    ├ imageM+2
    ├ imageM+3
    └ imageN

train_dataset_images_part3 (2000 images)     
└  ├ coco_part3.json    
   ├ imageN+1
   ├ imageN+2
   ├ imageN+3
   └ imageZ

and for every separate dataset with 2000 images i ran script from_coco().as_yolo() and and i was able to get results without error, but then i checked every output yaml file and saw "names" array was not same.

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 4, 2023

@DovydasPociusDroneTeam, thanks a lot for helping us to understand what's happening. Could you help us a bit more and check the categories key in coco_part1.json, coco_part2.json, and coco_part3.json.

Please paste categories for each JSON here. If categories are precisely the same in each JSON, then we have a problem.

@DovydasPociusDroneTeam
Copy link
Author

@DovydasPociusDroneTeam, thanks a lot for helping us to understand what's happening. Could you help us a bit more and check the categories key in coco_part1.json, coco_part2.json, and coco_part3.json.

Please paste categories for each JSON here. If categories are precisely the same in each JSON, then we have a problem.

You are right! In my coco_part1.json and coco_part2.json categories are not in the same sequence!

Okeyy... So i used bad converter from LABELME to COCO (when splitted dataset to 3 separates), don't know why it mixed categories sequence..

Thank you for that info!
Looking forward to converting the full dataset without needing to split it into separate parts!

@SkalskiP
Copy link
Collaborator

SkalskiP commented Aug 4, 2023

@DovydasPociusDroneTeam 🔥 Awesome that we managed to get to the bottom of this problem.

Looking forward to converting the full dataset without needing to split it into separate parts!

We will need to introduce lazy loading of images to make that happen. It is on our roadmap. I'll pin this issue there to keep track of that problem.

I'll close the issue for now.

@SkalskiP SkalskiP closed this as completed Aug 4, 2023
@Killua7362
Copy link
Contributor

@SkalskiP Can't we just save the dataset in pandas dataset then retrieve it batch by batch?

@SkalskiP
Copy link
Collaborator

Hi @Killua7362 👋🏻 Could you elaborate?

@Killua7362
Copy link
Contributor

Hello @SkalskiP
hope you are well
I am new to this community So I might be wrong here and
If I want to create a dataset of images using roboflow will it save a generator object or whole dataset?

@SkalskiP
Copy link
Collaborator

Hi @Killua7362 👋🏻 No worries. I'm happy to explain. For now, you will always load the whole dataset, but we are thinking about adding a generator option.

@Killua7362
Copy link
Contributor

Killua7362 commented Sep 26, 2023

Hi @Killua7362 👋🏻 No worries. I'm happy to explain. For now, you will always load the whole dataset, but we are thinking about adding a generator option.

Can I try adding that option if you don't mind? @SkalskiP

@SkalskiP
Copy link
Collaborator

Hi @Killua7362, there is already the issue and PR, but I didn't have time to review it yet.

@lonngxiang
Copy link

lonngxiang commented Nov 22, 2023

@SkalskiP i use this datasets with 1 label https://universe.roboflow.com/naumov-igor-segmentation/car-segmetarion

image

but i use this script coco2yolo,but i got 2 labeles

import supervision as sv

sv.DetectionDataset.from_coco(
    images_directory_path= r"C:\Users\loong\Downloads\Car\valid",
    annotations_path=r"C:\Users\loong\Downloads\Car\valid\_annotations.coco.json",
    force_masks=True
).as_yolo(
    images_directory_path=r"C:\Users\loong\Downloads\Car_yolo\val\images",
    annotations_directory_path=r"C:\Users\loong\Downloads\Car_yolo\val\labels",
    data_yaml_path=r"C:\Users\loong\Downloads\Car_yolo\data.yaml"
)

image

and the generated format doesn't seem right either

image

@SkalskiP
Copy link
Collaborator

Hi @lonngxiang 👋🏻 I'm happy to help out. I just responded to your issue. Let's move the conversation there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants