NOTE

Hard-burned subtitles OCR to SRT extractor

Apple Silicon M1/M2 toolchain for extracting .SRT subtitles from movies with embedded hard-burned subtitles
the OCR step is using a modified version of macOCR (forked from https://github.com/xulihang/macOCR); the macos Apple Silicon ARM64 binary is included in the repo as OCR

The workflow sequence run by the do-all.sh script:

Generate cropped video with ffmpeg (you'll have to adjust the crop area for your video size)
Generate PNG snapshots (using ffmpeg ... fps=1 — 1 snapshot per second)
Optical Character Recognition using macOCR (Apple Silicon only) outputs JSON file.
Convert JSON to SRT + normalize and deduplicate using https://github.com/cdown/srt.
optional: Generate Chinese pinyin and traditional/simplified versions.
optional: Translate with deepl.
optional: Merge translation into the final SRT containing Hanzi Simplified + Hanzi Traditional + Pinyin + English.

NOTE

this collection of scripts is work in progress and will require tweaking for each specific scenario (the corresponding places that need editing are marked with TODO comments in the code); use at your own risk

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
LICENSE		LICENSE
OCR		OCR
README.md		README.md
deepl.py		deepl.py
do-all.sh		do-all.sh
do-ocr.py		do-ocr.py
gensrt.py		gensrt.py
srt_merge.py		srt_merge.py
srt_subs_zh2pinyin.py		srt_subs_zh2pinyin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

OCR

OCR

README.md

README.md

deepl.py

deepl.py

do-all.sh

do-all.sh

do-ocr.py

do-ocr.py

gensrt.py

gensrt.py

srt_merge.py

srt_merge.py

srt_subs_zh2pinyin.py

srt_subs_zh2pinyin.py

Repository files navigation

Hard-burned subtitles OCR to SRT extractor

NOTE

About

Releases

Packages

Languages

License

glowinthedark/subtitles-ocr

Folders and files

Latest commit

History

Repository files navigation

Hard-burned subtitles OCR to SRT extractor

NOTE

About

Topics

Resources

License

Stars

Watchers

Forks

Languages