Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extract image: index out of range #804

Open
arkoudacode opened this issue Feb 14, 2024 · 3 comments
Open

extract image: index out of range #804

arkoudacode opened this issue Feb 14, 2024 · 3 comments
Assignees

Comments

@arkoudacode
Copy link

I am using pdfcpu version 0.6.0, which is the latest version as of my knowledge.

OS: Ubuntu 20.04

The issue occurs with PDFs that appear to have specific characteristics in their image encoding, not tied to a specific PDF writer.

Description

I encountered a runtime panic due to an "index out of range" error in the renderIndexedRGBToPNG function when trying to extract images from certain PDF files using pdfcpu. The panic suggests an access attempt outside the bounds of an array.

Steps to Reproduce

  1. Use pdfcpu to extract images from a PDF that contains indexed color space images.
  2. The panic occurs during the extraction process, specifically within the renderIndexedRGBToPNG function.

Expected Behavior

The function should handle the image data gracefully without causing a runtime panic, regardless of the specific content structure within the PDF.

Actual Behavior

A runtime panic occurs with the message: panic: runtime error: index out of range [31500] with length 31500.

Proposed Solution

I've modified the renderIndexedRGBToPNG function to include bounds checking for both the image content array and the lookup table. This adjustment successfully resolved the panic issue in my testing. Here's the modified function:

func renderIndexedRGBToPNG(im *PDFImage, resourceName string, lookup []byte) (io.Reader, string, error) {
    b := im.sd.Content

    if len(b) == 0 || len(lookup) == 0 {
        return nil, "", errors.New("image content or lookup table is empty")
    }

    img := image.NewNRGBA(image.Rect(0, 0, im.w, im.h))

    i := 0
    for y := 0; y < im.h; y++ {
        for x := 0; x < im.w; {
            if i >= len(b) {
                return nil, "", fmt.Errorf("index out of bounds: i=%d, len(b)=%d", i, len(b))
            }
            p := b[i]
            for j := 0; j < 8/im.bpc; j++ {
                if x >= im.w {
                    // Prevents writing pixels beyond the image width
                    break
                }
                ind := p >> (8 - uint8(im.bpc))
                l := 3 * int(ind)
                if l+2 >= len(lookup) {
                    return nil, "", fmt.Errorf("lookup index out of bounds: l=%d, len(lookup)=%d", l, len(lookup))
                }
                alpha := uint8(255)
                if im.softMask != nil && y*im.w+x < len(im.softMask) {
                    alpha = im.softMask[y*im.w+x]
                }
                img.Set(x, y, color.NRGBA{R: lookup[l], G: lookup[l+1], B: lookup[l+2], A: alpha})
                p <<= uint8(im.bpc)
                x++
            }
            i++
        }
    }

    var buf bytes.Buffer
    if err := png.Encode(&buf, img); err != nil {
        return nil, "", err
    }

    return &buf, "png", nil
}
@hhrutter
Copy link
Collaborator

hhrutter commented Feb 16, 2024

Thanks for reporting this.
Could you share a PDF for verification and analysis?

PS: Always ensure you are using the latest commit.

@hhrutter hhrutter changed the title Index out of range panic in renderIndexedRGBToPNG with certain PDFs extract image: index out of range Feb 25, 2024
@arkoudacode
Copy link
Author

Unfortunately I cannot share the PDF. If I come across another PDF that is not sensitive, I will update this issue.

@hhrutter
Copy link
Collaborator

Understood!
It's just that I need a way to test that particular scenario.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants