Extracting table data? #1

munikarmanish · 2019-01-02T04:32:47Z

Right now, it only seems to perform OCR. i.e., convert image to raw text. Is there any table-specific extraction performed? Basically, I'm researching about good algorithms to extract tabular data from scanned documents.

Thanks in advance. :)

cseas · 2019-01-11T04:58:45Z

Hi, @munikarmanish !

You're correct. The OCR currently only works for pre-processed images.

While it does extract data from PDFs with tables, it currently performs a horizontal scan and doesn't perform any table based classification on the text yet, I'm still trying to figure out how to make that work.

A make-do solution could be to classify the text after extraction based on the length of columns but that will only work if every column has a fixed length of words, which is not the case in most scenarios.

aribornstein · 2019-02-12T09:11:21Z

The way to do this is to use code to do table detection (column and row) and then preform the ocr within the table it's a really hard problem though.

jaysinghr · 2019-06-27T05:40:04Z

Hi @munikarmanish did you found any thing regarding the research you mentioned above ?

munikarmanish · 2019-07-03T04:26:34Z

Hi @munikarmanish did you found any thing regarding the research you mentioned above ?

Yes, I've found a few interesting approaches:

SAIVENKATARAJU · 2021-11-10T12:53:47Z

I am also facing above issue. did any found best solution after 2 years?

cseas added enhancement New feature or request help wanted Extra attention is needed labels Jan 11, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting table data? #1

Extracting table data? #1

munikarmanish commented Jan 2, 2019

cseas commented Jan 11, 2019

aribornstein commented Feb 12, 2019

jaysinghr commented Jun 27, 2019

munikarmanish commented Jul 3, 2019

SAIVENKATARAJU commented Nov 10, 2021

Extracting table data? #1

Extracting table data? #1

Comments

munikarmanish commented Jan 2, 2019

cseas commented Jan 11, 2019

aribornstein commented Feb 12, 2019

jaysinghr commented Jun 27, 2019

munikarmanish commented Jul 3, 2019

SAIVENKATARAJU commented Nov 10, 2021