Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Division by zero without using table_regions #480

Open
cmartinotti opened this issue Feb 21, 2022 · 1 comment
Open

Division by zero without using table_regions #480

cmartinotti opened this issue Feb 21, 2022 · 1 comment

Comments

@cmartinotti
Copy link

cmartinotti commented Feb 21, 2022

Hello, I'm trying to extract tables with defaults parameters in stream mode. I try:

tables_cam=camelot.read_pdf(filepath='pdfs_files/fulltext.pdf', pages="10",flavor='stream' )
                          

It returns;

ZeroDivisionError                         Traceback (most recent call last)
/tmp/ipykernel_7338/1281916851.py in <module>
----> 1 tables_cam=camelot.read_pdf(filepath='pdfs_files/fulltext.pdf',
      2                             pages="9,10",
      3                             flavor='stream',
      4                             edge_tol=500
      5                            )

~/anaconda3/envs/test/lib/python3.8/site-packages/camelot/io.py in read_pdf(filepath, pages, password, flavor, suppress_stdout, layout_kwargs, **kwargs)
    111         p = PDFHandler(filepath, pages=pages, password=password)
    112         kwargs = remove_extra(kwargs, flavor=flavor)
--> 113         tables = p.parse(
    114             flavor=flavor,
    115             suppress_stdout=suppress_stdout,

~/anaconda3/envs/test/lib/python3.8/site-packages/camelot/handlers.py in parse(self, flavor, suppress_stdout, layout_kwargs, **kwargs)
    174             parser = Lattice(**kwargs) if flavor == "lattice" else Stream(**kwargs)
    175             for p in pages:
--> 176                 t = parser.extract_tables(
    177                     p, suppress_stdout=suppress_stdout, layout_kwargs=layout_kwargs
    178                 )

~/anaconda3/envs/test/lib/python3.8/site-packages/camelot/parsers/stream.py in extract_tables(self, filename, suppress_stdout, layout_kwargs)
    461             sorted(self.table_bbox.keys(), key=lambda x: x[1], reverse=True)
    462         ):
--> 463             cols, rows = self._generate_columns_and_rows(table_idx, tk)
    464             table = self._generate_table(table_idx, cols, rows)
    465             table._bbox = tk

~/anaconda3/envs/test/lib/python3.8/site-packages/camelot/parsers/stream.py in _generate_columns_and_rows(self, table_idx, tk)
    323         # select elements which lie within table_bbox
    324         t_bbox = {}
--> 325         t_bbox["horizontal"] = text_in_bbox(tk, self.horizontal_text)
    326         t_bbox["vertical"] = text_in_bbox(tk, self.vertical_text)
    327 

~/anaconda3/envs/test/lib/python3.8/site-packages/camelot/utils.py in text_in_bbox(bbox, text)
    374             if bbox_intersect(ba, bb):
    375                 # if the intersection is larger than 80% of ba's size, we keep the longest
--> 376                 if (bbox_intersection_area(ba, bb) / bbox_area(ba)) > 0.8:
    377                     if bbox_longer(bb, ba):
    378                         rest.discard(ba)

ZeroDivisionError: float division by zero

I would expect it to return no tables found (like normally it does) rather than crashing for a 0 division. How do I prevent this?
PDF_FILE: fulltext.pdf

PS: If I transform a pdf page into an image, find the table area on the image and then decide to pass the corresponding area to camelot to extract the tables, is the conversion from the position on the image to the position in the pdf just pos_image * size_pdf/size_img ?

@ashleych
Copy link

Hello @cmartinotti
Any luck with this? I am stuck with the same issue, but in an Arabic language pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants