Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GSSoC'24: OCR Detection #62

Merged
merged 16 commits into from
Jun 6, 2024
Merged

Conversation

SAM-DEV007
Copy link
Contributor

@SAM-DEV007 SAM-DEV007 commented May 14, 2024

Resolves #56

The pull request for the OCR Detection resolving the feature enhancement.

OCR_Detection

The OCR Detection is introduced in order to help the victims in the situation where they can not use other means to communicate or seek help, other than
written communication shown to the camera. Also, it helps in detecting potential self-harm when the victim is in the process of writing the death note, and the
camera catches a glimpse of it and can use existing models to determine the scale of the threat that uses text as their primary input.

Usage

It is to be kept in mind that a window will only be created if there are text detected by the model. For visualizing another image, that window has to be closed in order for the another window to appear.

  • demo.py to start the web camera for obtaining frames.
  • Ctrl + C to exit from the script.
  • If a text is detected, a new window opens with the text detected, annotations and the confidence. The detected text is also printed for convenience.

Working

easyocr package is used to provide image to text detection. Model_Data contains the downloaded model to reduce the online dependancy.

  • detect.py contains the functions that can be imported by other scripts to be executed to perform image to text detection.
  • demo.py contains a demo code which showcases the functionality.

OpenCV without GUI (opencv-python-headless) is used to optimize the script for detection. It is useful in optimizing the detection speed by removing useless processes used for GUI.
Additionally, for web integration, GUI is not needed but the other functionalities remains the same.

demo.py also contains an optimization which prevents the execution of the model detection if the frame difference is less, i.e., the frames hasn't changed much. MSE (Mean Squared Error) is used to calculate the difference between the two frame. The model only gets executed, if the error is greater than 20. This can be modified by changing the value of ERR_DIFF.

Multi-processing can be used to get seamless detections without delay.

Demo

  • The image is purely for the demonstration purposes. The red text shows the detected text (the detected text is large so it is cutting from the screen). The bounding boxes, the green text above it and the confidence is for visualization purposes.

Figure_1

  • The more the resolution of the camera, the better the results.

SAM-DEV007 and others added 16 commits May 14, 2024 21:39
Added functions to detect text and process data from the image
Removed verbose printing
Added detection from webcam livestream
Added full detected text display
Added mse function
Disabled reading as a paragraph
Fixed confidence bug
Updated the file to fit the requirements of text detection
@SAM-DEV007
Copy link
Contributor Author

@TAHIR0110 Please review the pull request, and let me know if changes are required or not.

@SAM-DEV007 SAM-DEV007 changed the title OCR Detection GSSoC'24: OCR Detection May 18, 2024
@SAM-DEV007
Copy link
Contributor Author

@TAHIR0110 Please also add the labels in the PR, the same that is mentioned in the issue.

@TAHIR0110 TAHIR0110 added level1 gssoc Associated with GSSOC labels May 22, 2024
@sudiptasarkar011
Copy link

I would like to work on this, can you please assign this to me?

@TAHIR0110
Copy link
Owner

@SAM-DEV007 I have merged it and labelled it as level3 instead of level1.

@TAHIR0110 TAHIR0110 added level3 and removed level2 labels Jun 6, 2024
@TAHIR0110 TAHIR0110 merged commit 84663fb into TAHIR0110:main Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gssoc Associated with GSSOC level3
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GSSoC'24: OCR Detection (Image to Text)
3 participants