Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Floating-point exception (SIGFPE) due to out-of-range input to asinf in Wordrec::angle_change #4242

Closed
ChristianOsta opened this issue May 16, 2024 · 4 comments
Labels

Comments

@ChristianOsta
Copy link

Current Behavior

The image below causes a floating-point exception (SIGFPE) under ubuntu (WSL) when using the legacy model with psm_mode = 7 due to an invalid input to the asinf function. The exception is triggered when the input to asinf is slightly out of its valid range, specifically -1.00000012. This results in a program termination with a SIGFPE error. Notably, this issue does not occur under Windows.

Backtrace:
The backtrace indicates that the error originates from the tesseract::Wordrec::angle_change function:
-> see "other information"

tesseract command:
tesseract.exe -l eng+deu "tesseract_fail.png" stdout --tessdata-dir "<TESSDATA_DIR>" --oem 0 --psm 7

i used the legacy models for english and german from tesseract-ocr/tessdata

interestingly, when moving the single "d" in the bottom part of the image one pixel up or to the right the exception will not be thrown anymore.

I will gladly provide additional information if needed.

image to reproduce the behavior:
tesseract_crash

Expected Behavior

Tesseract should handle the input gracefully without causing a floating-point exception.

Suggested Fix

No response

tesseract -v

tesseract 5.3.4
leptonica-1.83.1
libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.13 : libwebp 1.4.0 : libopenjp2 2.5.2
Found AVX512BW
Found AVX512F
Found AVX512VNNI
Found AVX2
Found AVX
Found FMA
Found SSE4.1
Found OpenMP 201511
Found libarchive 3.7.2 zlib/1.2.13 liblzma/5.2.6 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.5

Operating System

No response

Other Operating System

Ubuntu inside Windows Subsystem for Linux (WSL)

Distributor ID: Ubuntu
Description: Ubuntu 22.04.4 LTS
Release: 22.04
Codename: jammy

uname -a

Linux 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

this is the output of bt: (gdb) bt
#0 0x00007f66916bc552 in __GI___feraiseexcept (excepts=excepts@entry=1)
at ../sysdeps/x86_64/fpu/fraiseexcpt.c:36
#1 0x00007f66916c2590 in __asinf (x=-1.00000012) at ./math/w_asinf_compat.c:34
#2 __asinf (x=-1.00000012) at ./math/w_asinf_compat.c:28
#3 0x00007f6691f8dd63 in tesseract::Wordrec::angle_change(tesseract::EDGEPT*, tesseract::EDGEPT*, tesseract::EDGEPT*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#4 0x00007f6691f8e243 in tesseract::Wordrec::pick_close_point(tesseract::EDGEPT*, tesseract::EDGEPT*, int*)
() from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#5 0x00007f6691f8e66b in tesseract::Wordrec::vertical_projection_point(tesseract::EDGEPT*, tesseract::EDGEPT*, tesseract::EDGEPT**, tesseract::EDGEPT_CLIST*) ()
from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#6 0x00007f6691f935c6 in tesseract::Wordrec::try_vertical_splits(tesseract::EDGEPT**, short, tesseract::EDGEPT_CLIST*, tesseract::GenericHeap<tesseract::KDPtrPairInc<float, tesseract::SEAM> >, tesseract::GenericHeap<tesseract::KDPtrPairDec<float, tesseract::SEAM> >, tesseract::SEAM**, tesseract::TBLOB*) ()
from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#7 0x00007f6691f93c56 in tesseract::Wordrec::pick_good_seam(tesseract::TBLOB*) ()
from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#8 0x00007f6691f8fa43 in tesseract::Wordrec::attempt_blob_chop(tesseract::TWERD*, tesseract::TBLOB*, int, bool, std::vector<tesseract::SEAM*, std::allocatortesseract::SEAM* > const&) ()
from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#9 0x00007f6691f909b2 in tesseract::Wordrec::improve_one_blob(std::vector<tesseract::BLOB_CHOICE*, std::allocatortesseract::BLOB_CHOICE* > const&, std::vector<tesseract::DANGERR_INFO, std::allocatortesseract::DANGERR_INFO >, bool, bool, tesseract::WERD_RES, unsigned int*) ()
from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#10 0x00007f6691f90bd0 in tesseract::Wordrec::improve_by_chopping(float, tesseract::WERD_RES*, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*, tesseract::LMPainPoints*, std::vector<tesseract::SegSearchPending, st--Type for more, q to quit, c to continue without paging--c
d::allocatortesseract::SegSearchPending >) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#11 0x00007f6691fa0a78 in tesseract::Wordrec::SegSearch(tesseract::WERD_RES
, tesseract::BestChoiceBundle*, tesseract::BlamerBundle*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#12 0x00007f6691f8f0c8 in tesseract::Wordrec::chop_word_main(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#13 0x00007f6691f8cc6d in tesseract::Wordrec::cc_recog(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#14 0x00007f6691e5f71c in tesseract::Tesseract::recog_word_recursive(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#15 0x00007f6691e5f8c4 in tesseract::Tesseract::recog_word(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#16 0x00007f6691e5cb62 in tesseract::Tesseract::tess_segment_pass_n(int, tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#17 0x00007f6691e04b52 in tesseract::Tesseract::match_word_pass_n(int, tesseract::WERD_RES*, tesseract::ROW*, tesseract::BLOCK*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#18 0x00007f6691e04d0b in tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, tesseract::WERD_RES**, tesseract::PointerVectortesseract::WERD_RES) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#19 0x00007f6691e0810a in tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::
)(tesseract::WordData const&, tesseract::WERD_RES**, tesseract::PointerVectortesseract::WERD_RES), bool, tesseract::WERD_RES**, tesseract::PointerVectortesseract::WERD_RES) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#20 0x00007f6691e08b22 in tesseract::Tesseract::classify_word_and_language(int, tesseract::PAGE_RES_IT*, tesseract::WordData*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#21 0x00007f6691e0d41d in tesseract::Tesseract::RecogAllWordsPassN(int, tesseract::ETEXT_DESC*, tesseract::PAGE_RES_IT*, std::vector<tesseract::WordData, std::allocatortesseract::WordData >) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#22 0x00007f6691e0e464 in tesseract::Tesseract::recog_all_words(tesseract::PAGE_RES
, tesseract::ETEXT_DESC*, tesseract::TBOX const*, char const*, int) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#23 0x00007f6691ddff64 in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#24 0x00007f6691de056b in tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#25 0x00007f6691de18e1 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#26 0x00007f6691de1adf in tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5
#27 0x0000556ecc08455b in main ()

@stweil stweil added the bug label May 16, 2024
@stweil
Copy link
Contributor

stweil commented May 16, 2024

Unrelated:

--psm 7 won't work for a rotated line image. That requires --psm 1 (or no argument for page segmentation mode).

stweil added a commit to stweil/tesseract that referenced this issue May 16, 2024
std::asin only allows arguments in [-1, 1], but rounding errors can
produce values which are slightly outside of this range and which
would cause a FP exception (or wrong calculation results).

Rename also the internally used function TPOINT::length to TPOINT::length2
because it calculates the square of the length.

Signed-off-by: Stefan Weil <[email protected]>
@stweil
Copy link
Contributor

stweil commented May 16, 2024

@ChristianOsta, if you want you can try and review the pull request #4243 which fixes the issue.

@stweil
Copy link
Contributor

stweil commented May 16, 2024

Notably, this issue does not occur under Windows.

FP exceptions are enabled conditionally in main(). Therefore this exception is not thrown on macOS (with clang compiler) and on Windows (compiler without HAVE_FEENABLEEXCEPT).

stweil added a commit to stweil/tesseract that referenced this issue May 16, 2024
std::asin only allows arguments in [-1, 1], but rounding errors can
produce values which are slightly outside of this range and which
would cause a FP exception (or wrong calculation results).

Rename also the internally used function `TPOINT::length` to `TPOINT::length2`
because it calculates the square of the length.

Signed-off-by: Stefan Weil <[email protected]>
amitdo pushed a commit that referenced this issue May 17, 2024
std::asin only allows arguments in [-1, 1], but rounding errors can
produce values which are slightly outside of this range and which
would cause a FP exception (or wrong calculation results).

Rename also the internally used function `TPOINT::length` to `TPOINT::length2`
because it calculates the square of the length.

Signed-off-by: Stefan Weil <[email protected]>
@amitdo
Copy link
Collaborator

amitdo commented May 17, 2024

The fix was pushed to the main branch.

@stweil stweil closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants