docutranslate

does the following:

scanned pdf file -> images -> text -> gpt-4o -> translated word doc

see test.ipynb for details

example usage

install requirements

pip install -r requirements.txt

process the entire PDF:

python main.py attention.pdf --language "Chinese (Traditional)"

process a single page:

python main.py attention.pdf --language "Chinese (Traditional)" --single-page --page-number 1

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
attention.pdf		attention.pdf
attention_original_text.docx		attention_original_text.docx
attention_page1_translation.txt		attention_page1_translation.txt
attention_translated_Chinese (Traditional).docx		attention_translated_Chinese (Traditional).docx
main.py		main.py
requirements.txt		requirements.txt
test.ipynb		test.ipynb