Computer Vision, Audio, & Multimodal Projects

This repository houses both semi-structured and non-structured projects that both were not completed using Spark and are not Natural Language (NLP) projects.

Binary Image Classification (Computer Vision)

Project Name	Accuracy	F1-Score	Precision	Recall
Bart vs Homer	0.9863	0.9841	0.9688	1.0
Brain Tumor MRI Images	0.9216	0.9375	0.8824	1.0
COVID19 Lung CT Scans	0.94	0.9379	0.9855	0.8947
Car or Motorcycle	0.9938	0.9939	0.9951	0.9927
Dogs or Cats Image Classification	0.99	0.9897	0.9885	0.9909
Male or Female Eyes	0.9727	0.9741	0.9818	0.9666
Breast Histopathology Image Classification	0.8202	0.8151	0.8141	0.8202

Multiclass & Multilabel Image Classification

Multiclass Image Classification

Project Name	Accuracy	Macro F1-Score	Macro Precision	Macro Recall	Best Algorithm
Brain Tumors Image Classification 1	0.8198	0.8054	0.8769	0.8149	Vision Transformer (ViT)
Diagnoses from Colonoscopy Images	0.9375	0.9365	0.9455	0.9375	-
Human Activity Recognition	0.8381	0.8394	0.8424	0.839	-
Intel Image Classification	0.9487	0.9497	0.9496	0.95	-
Landscape Recognition	0.8687	0.8694	0.8714	0.8687	-
Lung & Colon Cancer	0.9994	0.9994	0.9994	0.9994	-
Mango Leaf Disease Dataset	1.0	1.0	1.0	1.0	-
Simpsons Family Images	0.953	0.9521	0.9601	0.9531	-
Vegetable Image Classification	1.0	1.0	1.0	1.0	-
Weather Images	0.934	0.9372	0.9398	0.9354	-
Hyper Kvasir Labeled Image Classification	0.8756	0.5778	0.5823	0.5746	-

Multilabel Image Classification

Project Name	Subset Accuracy	F1 Score	ROC AUC
Futurama - ML Image CLF	0.9672	0.9818	0.9842

Object Detection (Computer Vision)

Project Name	Avg. Precision²	Avg. Recall³
License Plate Object Detection	0.513	0.617
Pedestrian Object Detection	0.560	0.745
ACL X-Rays	0.09	0.308
Abdomen MRIs	0.453	0.715
Axial MRIs	0.284	0.566
Blood Cell Object Detection	0.344	0.448
Brain Tumors	0.185	0.407
Cell Tower Object Detection	0.287	0.492
Stomata Cells	0.340	0.547
Excavator Object Detection	0.386	0.748
Forklift Object Detection	0.136	0.340
Hard Hat Object Detection	0.346	0.558
Liver Disease Object Detection	0.254	0.552

There are other Object Detection projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.

Image Segmentation (Computer Vision)

Project Name	Mean IoU	Mean Accuracy	Overall Accuracy	Use PEFT?
Carvana Image Modeling	0.9917	0.9962	0.9972	Yes
Dominoes	0.9198	0.9515	0.9778	Yes
CMP Facade (V2)	0.3102	0.4144	0.6267	Yes

There are other Image Segmentation projects posted in the 'Trained, But Not To Standard' subdirectory. Basically, the code is completed, but due to constraints, it would take an unreasonably long time to train them. That said, the metrics are not the greatest for them.

Document AI Projects

Multiclass Classification

Project Name	Accuracy	Macro F1 Score	Macro Precision	Macro Recall
Document Classification - Desafio_1	0.9865	0.9863	0.9870	0.9861
Document Classification RVL-CDIP	0.9767	0.9154	0.9314	0.9019
Real World Documents Collections	0.767	0.7704	0.7767	0.7707
Real World Documents Collections_v2	0.826	0.8242	0.8293	0.8237
Tobacco-Related Documents	0.7532	0.722	-	-
Tobacco-Related Documents_v2	0.8666	0.8308	-	-
Tobacco-Related Documents_v3	0.9419	0.9278	-	-

Audio Projects

Project Name	Project Type
Vinyl Scratched or Not	Binary Audio Classification
Audio-Drum Kit Sounds	Multiclass Audio Classification
Speech Emotion Detection	Emotion Detection
Toronto Emotional Speech Set (TESS)	Emotion Detection
ASR Speech Recognition Dataset	Automatic Speech Recognition

Optical Character Recognition Projects

Project Name	CER⁴
20,000 Synthetic Samples Dataset	0.0029
Captcha	0.0075
Handwriting Recognition (v1)	0.0533
Handwriting Recognition (v2)	0.0360
OCR License Plate Text Recognition	0.0368
Tesseract E13B	0.0036
Tesseract CMC7	0.0050

Footnotes:

This project is part of a transformer comparison. ↩
Average Precision (AP) @[IoU=0.50:0.95 | area=all | maxDets=100] ↩
Average Recall (AR) @[IoU=0.50:0.95 | area=all | maxDets=100] ↩
CER stands for Character Error Rate. ↩

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
Audio-Projects		Audio-Projects
Computer Vision		Computer Vision
Document AI		Document AI
Optical Character Recognition (OCR)		Optical Character Recognition (OCR)
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio-Projects

Audio-Projects

Computer Vision

Computer Vision

Document AI

Document AI

Optical Character Recognition (OCR)

Optical Character Recognition (OCR)

README.md

README.md

Repository files navigation

Computer Vision, Audio, & Multimodal Projects

Multiclass Image Classification

Multilabel Image Classification

Multiclass Classification

About

Releases

Packages

Languages

DunnBC22/Vision_Audio_and_Multimodal_Projects

Folders and files

Latest commit

History

Repository files navigation

Computer Vision, Audio, & Multimodal Projects

Multiclass Image Classification

Multilabel Image Classification

Multiclass Classification

Footnotes

About

Topics

Resources

Stars

Watchers

Forks

Languages