Page.apply_redactions() removes more text than expected in the pdf document. #3433

dameyerdave · 2024-05-02T12:50:50Z

Description of the bug

As soon as I apply the reductions all the text and graphics get lost from the pdf.

Source:

Annotated:

After apply_reductions():

How to reproduce the bug

This is the code I wrote to come tho this:

doc = fitz.open("./Receipt.pdf")
for page in doc:
    for text in some_text_array:
        for area in page.search_for(text, quads=True):
            reduction = page.add_redact_annot(
                area,
                fill=(0, 0, 0),
            )
            reduction.update()

    # here it happens
    page.apply_redactions(0,0,0)

doc.save("./redacted.pdf")
doc.close()

PyMuPDF version

1.24.2

Operating system

MacOS

Python version

3.10

The text was updated successfully, but these errors were encountered:

JorjMcKie · 2024-05-02T14:28:17Z

Please provide all mandatory information - in this case, the reproducing file is missing.

dameyerdave · 2024-05-03T06:54:31Z

I'm sorry for that. These are the files:

JorjMcKie · 2024-05-03T10:01:15Z

Thanks for the examples.
Sorry I cannot find a problem. Made a redaction to remove "David Meyer" and it simply worked!

for r in page.search_for("david meyer"):
    page.add_redact_annot(r)

    
'Redact' annotation on page 0 of original.pdf
page.apply_redactions(0,0,0)
True
doc.ez_save("x-1.24.2.pdf")

In the meantime, I also redacted other parts of the page (the text "October 19, 2023") , and they also worked without complaints.

aleem75321 · 2024-05-05T20:30:19Z

HI @JorjMcKie I have faced the same issue while applying Redaction. they remove images which should not be removed or changing text.
test.pdf
test2.pdf

I have attached both pdf to reproduce the issue

Code:-

import fitz
from pathlib import Path


file_path=Path(r"test_pages/test.pdf")

doc=fitz.open(file_path)
page=doc[0]


blocks=page.get_text("rawdict",flags=fitz.TEXTFLAGS_TEXT,sort=True)["blocks"]  
#Set Colour for outoput PDF
Red = fitz.pdfcolor["red"]

for b in  blocks:
    for l in b["lines"]:  
        for s in l["spans"]:
            for c in s["chars"]:

                if s["size"]>15 and s['color']==2236191: 
                    if c['c']== "ं":
                        try:
                            font = fitz.Font(fontname=s['font'],fontfile=f"{s['font']}.ttf")  # this must be known somehow - or simply try some font else
                        except Exception as e:
                            print(str(e))  
                        redact_box = fitz.Rect(c["bbox"]) 
                        origin_text = fitz.Point(c["origin"]) 
                        redact_box.y1 = redact_box.y1-s['size'] 
                        page.add_redact_annot(redact_box) 
                        # Apply reactions after all text replacements
                        page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE,graphics=fitz.PDF_REDACT_LINE_ART_NONE)
                        # Create Text writer to Write in Page with choose Color
                        tw = fitz.TextWriter(page.rect,color=Red)  
                        #re-insert same text - different color
                        tw.append((origin_text.x,origin_text.y), text=c['c'],fontsize=s['size'],font=font)
                        tw.write_text(page) 

#Saving Backup File furture use 
out_fpath="OUT/"+file_path.stem+".pdf"
doc.save(out_fpath,garbage=3, deflate=True)
doc.close()

PyMuPDF version
1.24.2

Operating system
windows

Python version
3.11.4

JorjMcKie · 2024-05-06T07:55:25Z

@aleem75321 please submit this as a different issue - this is too confusing in this context.
When you do, please save the PDF when you have inserted all redactions - before applying them. I need to confirm where your code has put them - without the need to understand your code.
Then attach this PDF to confirm that bad things happen on applying redactions.

aleem75321 · 2024-05-06T09:13:06Z

I have summited different issues please see the below link.

Facing Issues after applying redactions they delete some Images or Icons #3439

dameyerdave · 2024-05-06T15:05:41Z

I reduced the application to the bare minimum. I still encounter the same issue. I tried it on mac M3 and on ubuntu linux (Intel) as well as in a docker container with platform: linux/amd64 without success.

import fitz

doc = fitz.open("./original.pdf")
for page in doc:
    for r in page.search_for("David Meyer"):
        page.add_redact_annot(r)

    page.apply_redactions(0, 0)
doc.ez_save("redacted.pdf")

With the following files:

original.pdf
redacted.pdf

I don't know what to try now... If you have another good idea, please let me know...

JorjMcKie · 2024-05-06T15:27:51Z

@dameyerdave we (a colleague of mine and I) have tried on all 3 platforms now Mac, Linux, Win with fitz.version=('1.24.2', '1.24.1', '20240417000001') and are getting the correct, flawless result.
🤷‍♂️
That is no black rectangle and "David Meyer" removed in total.

JorjMcKie · 2024-05-06T15:30:44Z

My only advice is to re-install 1.24.2.
There has been a redaction issue previously. I will try with 1 or 2 previous versions.

JorjMcKie · 2024-05-06T15:36:09Z

No such luck:
At least on windows, all versions back to 1.23.26 do work correctly.
So you probably best re-install with the latest version.

luchux · 2024-05-06T21:15:29Z

We are facing exactly the same as everybody posting the bug in this thread.
Our version in the env is
Name: PyMuPDF
Version: 1.24.0

I tried removing the apply_redaction(images=0) and also used all the combos possible for the parameter.
Also tried removing garbage collectors, and deflates when saving.

Exactly the same error as other people:

Original PDF before redaction

After apply.redaction to text "Origin"

We would love to know if you are aware of this bug, and if there is a stable version that works properly without this bug. Thanks a lot!

luchux · 2024-05-06T21:59:49Z

Another example.
Now tested 3 versions: 1.24.0, 1.24.2 failing.

1.23.26: working well ! redaction works

Original before redaction:

After text redacted 1.24.0 and 1.24.2:

after text redacted with 1.23.26 (working!)

JorjMcKie · 2024-05-07T08:21:32Z

@luchux - "A picture is worth a thousand words."

Certainly true. But a thousand pictures are not worth a million words!
Please add an example file and no more pictures if we should confirm that yours is another duplicate of #3376.

Please also note, that the problem of this post is yet not reproducible and thus unclear whether it is a bug at all.

dameyerdave changed the title ~~Page.apply_redactions() removes all the text in the pdf document.~~ Page.apply_redactions() removes more text than expected in the pdf document. May 2, 2024

JorjMcKie added example required Waiting for information labels May 2, 2024

JorjMcKie removed example required Waiting for information labels May 3, 2024

JorjMcKie added duplicate fix developed release schedule to be determined Waiting for information and removed duplicate fix developed release schedule to be determined labels May 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page.apply_redactions() removes more text than expected in the pdf document. #3433

Page.apply_redactions() removes more text than expected in the pdf document. #3433

dameyerdave commented May 2, 2024

JorjMcKie commented May 2, 2024

dameyerdave commented May 3, 2024

JorjMcKie commented May 3, 2024 •

edited

aleem75321 commented May 5, 2024 •

edited

JorjMcKie commented May 6, 2024

aleem75321 commented May 6, 2024 •

edited

dameyerdave commented May 6, 2024 •

edited

JorjMcKie commented May 6, 2024 •

edited

JorjMcKie commented May 6, 2024

JorjMcKie commented May 6, 2024

luchux commented May 6, 2024 •

edited

luchux commented May 6, 2024 •

edited

JorjMcKie commented May 7, 2024

Page.apply_redactions() removes more text than expected in the pdf document. #3433

Page.apply_redactions() removes more text than expected in the pdf document. #3433

Comments

dameyerdave commented May 2, 2024

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

JorjMcKie commented May 2, 2024

dameyerdave commented May 3, 2024

JorjMcKie commented May 3, 2024 • edited

aleem75321 commented May 5, 2024 • edited

JorjMcKie commented May 6, 2024

aleem75321 commented May 6, 2024 • edited

dameyerdave commented May 6, 2024 • edited

JorjMcKie commented May 6, 2024 • edited

JorjMcKie commented May 6, 2024

JorjMcKie commented May 6, 2024

luchux commented May 6, 2024 • edited

luchux commented May 6, 2024 • edited

JorjMcKie commented May 7, 2024

JorjMcKie commented May 3, 2024 •

edited

aleem75321 commented May 5, 2024 •

edited

aleem75321 commented May 6, 2024 •

edited

dameyerdave commented May 6, 2024 •

edited

JorjMcKie commented May 6, 2024 •

edited

luchux commented May 6, 2024 •

edited

luchux commented May 6, 2024 •

edited