New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat/ocr_layer_to_pdf #2991
Comments
Hi @punjabdhaputar - could you describe the use case you have in mind for this feature? And do I understand correctly that your proposed solution would output a new PDF rather than a list of |
Hello @MthwRobinson! Actually I am thinking about another optional argument to the "partition" function like the following:
Where the partition function would write out a new PDF with the hidden text OCR layer to "ocr_pdf.pdf". The use-case I have is to be able to view the PDF with the text layer and be able to highlight specific text (e.g. a small phrase, subset of the previous chunks generated). |
Thanks @punjabdhaputar ! Definitely see the use case there. Writing to PDF is outside the scope of what we'd like to do within the |
Is your feature request related to a problem? Please describe.
When I OCR a PDF, I would like to be able to open the PDF and see the OCRed text as a hidden layer.
Describe the solution you'd like
I would like to have an option to output a new PDF file after the "partition" method that will be the original + a hidden text layer of the OCR text.
Additional context
Slack Thread: https://unstructuredw-kbe4326.slack.com/archives/C044N0YV08G/p1715109355171469
The text was updated successfully, but these errors were encountered: