You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have found that preserveOnlyMultipleLineBreaks: true is not working as expected. When the setting is on, the output converts \r\n to \r. But AFAIK \r on its own doesn't mean anything in Windows or Unix systems. I'm expecting it instead to convert \r\n\r\n to \r\n and to remove solo \r\n completely from the text output.
Seems like a bug?
The text was updated successfully, but these errors were encountered:
Trying to extract text from PDF using
textract.fromFileWithPath()
in a Windows environment. Using textract v2.5.0The following config is set:
I have found that
preserveOnlyMultipleLineBreaks: true
is not working as expected. When the setting is on, the output converts\r\n
to\r
. But AFAIK\r
on its own doesn't mean anything in Windows or Unix systems. I'm expecting it instead to convert\r\n\r\n
to\r\n
and to remove solo\r\n
completely from the text output.Seems like a bug?
The text was updated successfully, but these errors were encountered: