Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WordFontAttributes does not detect bold #294

Open
IKetchup opened this issue Jan 25, 2022 · 1 comment
Open

WordFontAttributes does not detect bold #294

IKetchup opened this issue Jan 25, 2022 · 1 comment

Comments

@IKetchup
Copy link

IKetchup commented Jan 25, 2022

I working on a project where I would like to find bold text in an image (medical repport).

For this Im using WordFontAttributes trhough the following function:

def get_words_info(image_path, tessdata_path):
    """
    get path to image and path to tessdata and return dict with info about each word
    """
    api = PyTessBaseAPI(path=tessdata_path)
    with PyTessBaseAPI(path=tessdata_path) as api:
        api.SetImageFile(image_path)
        api.Recognize()
        iter = api.GetIterator()
        level = RIL.WORD

        result = []

        for r in iterate_level(iter, level):
            element = r.GetUTF8Text(level)
            word_attributes = r.WordFontAttributes()
            base_line = r.BoundingBox(level)
            print(base_line)

            if element:
                word_attributes['word'] = element
                word_attributes['position'] = base_line

            result.append(word_attributes)

        return result

I tried this bold detection on several images on jpg format. For some the bold text is detected nicely but for some other like the picture bellow the bold text is not detected (bold boolean to false in the resust)
repport_28

here is the exemple for the doctor name in bold at the bottom right :

  'bold': False,
  'italic': False,
  'underlined': False,
  'monospace': False,
  'serif': True,
  'smallcaps': False,
  'pointsize': 9,
  'font_id': 283,
  'word': 'BERTHELEN',
  'position': (1696, 2518, 1979, 2549)},

Does someone know why the bold detection is not consistent ?

Thanks

@arnavmehta7
Copy link

Hey, can you tell which version are you using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants