how to extract figures in pdf ? #63

Myfootnotsmelly · 2023-12-19T08:43:12Z

After setup, I tried
1.
doc.figures
2.
json.dump

but the results showed only figure box's position and its metadata, how can i get figure in the pdf?

The text was updated successfully, but these errors were encountered:

kyleclo · 2024-03-13T22:29:31Z

Hey @Myfootnotsmelly , sorry looks like a bug introduced; adding in this pull request: #73

kyleclo · 2024-03-18T17:28:28Z

Hihi please take a look at my response to this Issue #70

Yes, figures are represented by bounding boxes:

If you want the image crop of the figures, here's how you'd do it:

# get the image of a page and its dimensions
page_image = doc.images[page_id]
page_w, page_h = page_image.pilimage.size

# get the bounding box of a figure
figure_box = figures[0].boxes[0]

# convert it
figure_box_xy = figure_box.to_absolute(page_width=page_w, page_height=page_h).xy_coordinates

# crop the image using PIL
page_image._pilimage.crop(figure_box_xy)

kyleclo self-assigned this Mar 13, 2024

kyleclo mentioned this issue Mar 20, 2024

How to get the page number of each figure? #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to extract figures in pdf ? #63

how to extract figures in pdf ? #63

Myfootnotsmelly commented Dec 19, 2023

kyleclo commented Mar 13, 2024

kyleclo commented Mar 18, 2024

how to extract figures in pdf ? #63

how to extract figures in pdf ? #63

Comments

Myfootnotsmelly commented Dec 19, 2023

kyleclo commented Mar 13, 2024

kyleclo commented Mar 18, 2024