Allow to specify PAPERLESS_OCR_MODE and PAPERLESS_OCR_LANGUAGE when doing "redo OCR" #1466
Replies: 11 comments 3 replies
-
I have a very similar use case. I agree that changing the conf or env files each time is very tedious. For use case 2, you could pass multiple languages e.g., I pass Maybe it is possible to add the ocr options to the UI settings? But there are many options and it may clutter the UI too much. |
Beta Was this translation helpful? Give feedback.
-
Alternatively the behaviour of the UI redo button can be changed to set And on the settings page a new input field could be provided to overwrite I guess this would require less UI changes and would also solve the problem to restart the server to do a "real redo" or change the language. |
Beta Was this translation helpful? Give feedback.
-
05.09.2022 23:07:22 ***@***.***:
Hiermit erhalten Sie Ihre Rechnung Rechnung-715966EW.pdf
Mit freundlichen Grüßen
Elektrowelt Zwickau GmbH
Impressum:
Elektrowelt Zwickau GmbH
Herschelstraße 9
08060 Zwickau
Tel.: 0375 / 27 04 99 50
Fax: 0375 / 27 04 99 69
E-Mail: ***@***.***
|
Beta Was this translation helpful? Give feedback.
-
It would be nice to have this in the UI but in the meantime you can do it in the terminal with docker exec docker exec -d -e "PAPERLESS_OCR_MODE=force" paperless_webserver_1 document_archiver --overwrite --document 15 If this is a common occurence you can have a small bash function which takes environment variables or document id as parameter. |
Beta Was this translation helpful? Give feedback.
-
I would also "vote" for having at least having the "Redo OCR" button really always have the OCR mode set to I also would really love to see that pressing the "Redo OCR" button also creates a "File Task". Yes, you could check the log to see if something happened, but having a task in the tasklist (that then maybe would also show if something was wrong) would be so much more the right thing. But this in itself is probably it's own feature request |
Beta Was this translation helpful? Give feedback.
-
+1 to change "Redo OCR" behavior to always force |
Beta Was this translation helpful? Give feedback.
-
I'm currently evaluating paperless-ngx and setting the OCR language is very important. I'm from Luxembourg. We are plurilingual and - due to the size of the country - have a lot of interactions with foreign companies. Even for personal use-cases. It is common here to find the best deal not only between companies, but also tapping into our neighbouring countries. We also have many foreign companies working here. So it is commonplace to receive letters in three languages (French, German and English) in a fairly even split. By using only one language in Paperless NGX I can only benefit from OCR in about a third of the scanned documents. I would expect a similar situation to exist in border-regions of in other countries as well. Luxembourg just happens to be one big border-region 😆 |
Beta Was this translation helpful? Give feedback.
-
There are related discussions in: |
Beta Was this translation helpful? Give feedback.
-
+1 to change "Redo OCR" behavior in the ui for specific documents |
Beta Was this translation helpful? Give feedback.
-
+1, "Redo OCR" should relaunch OCR process, regardless of settings. Makes totally sense. |
Beta Was this translation helpful? Give feedback.
-
Agreed, I also had some PDFs which already had text included, but that text was 'scanned with a trial version of program xxxx' instead of the actual text in the PDF :-) |
Beta Was this translation helpful? Give feedback.
-
My default PAPERLESS_OCR_MODE is
skip_noarchive
and my default language is "deu" (German) and in most cases it's fine.But somemtimes:
To fix use case 1 I would need to pass
redo
as PAPERLESS_OCR_MODE. Because I can't do this, I need to shutdown the server, change the PAPERLESS_OCR_MODE toredo
, then press in the UI "redo OCR" and then change PAPERLESS_OCR_MODE back to my default.It's similar to use case 2 where I need to change the language just for a single document where I want to redo the OCR.
Beta Was this translation helpful? Give feedback.
All reactions