Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transparency is lost when extracting images from PDFs #775

Open
tcm151 opened this issue Feb 6, 2024 · 6 comments
Open

Transparency is lost when extracting images from PDFs #775

tcm151 opened this issue Feb 6, 2024 · 6 comments
Labels
document-reading Related to reading documents no-document Indicates no sample document attached to issue

Comments

@tcm151
Copy link

tcm151 commented Feb 6, 2024

I am trying to pull out all of the images from a PDF that I have. It is a Pathfinder Adventure Module that I purchased awhile ago. Everything works great, but the only things is that the images lose their transparency when I try and save them, and then the images look awful with a black blocky background.

#pragma warning disable CA1416
using System.Drawing.Imaging;
using System.Drawing;
using UglyToad.PdfPig;

var filePath = args[0];
var destinationFolder = args[1];

using var file = File.Open(filePath, FileMode.Open);
using var pdf = PdfDocument.Open(file);

var encoders = ImageCodecInfo.GetImageDecoders();
var pngEncoder = encoders.First(enc => enc.FormatID == ImageFormat.Png.Guid);

Console.WriteLine("Exporting images...");
foreach (var page in pdf.GetPages())
{
	var images = page.GetImages().ToArray();
	for (int i = 0; i < images.Length; i++)
	{
		images[i].TryGetPng(out var pngBytes);
		using var stream = new MemoryStream(pngBytes ?? images[i].RawBytes.ToArray());
		using var image = Image.FromStream(stream, false, false);
		image.Save($"{destinationFolder}/{page.Number}-{i}.png", pngEncoder, null);
	}
}

Console.WriteLine($"Extracted images from {filePath}");
#pragma warning restore CA1416
@EliotJones
Copy link
Member

Are you able to share the document at all? In general I don't think images in PDF have a transparency layer but I haven't looked at the spec recently and don't recall

@EliotJones EliotJones added document-reading Related to reading documents no-document Indicates no sample document attached to issue labels Feb 18, 2024
@tcm151
Copy link
Author

tcm151 commented Feb 18, 2024

I have used other tools online for the exact same process, and they work and include the transparency properly,. I'd like to be able to use my own tool, which this package works perfectly for, except for the transparency.

My only concern with uploading the PDF here is that it is a paid product which is watermarked with my account information. I will temporarily upload it here, but I will need to remove it after a day or so.

EDIT: I can't seem to upload the PDF from my phone right now, I can try again on Monday when I return home.

@tcm151
Copy link
Author

tcm151 commented Feb 20, 2024

@BobLd
Copy link
Collaborator

BobLd commented Feb 20, 2024

One possible area to look into is
https://github.com/UglyToad/PdfPig/blob/c25368e5ab7c3add2bd771d940a31dc2e87f3d34/src/UglyToad.PdfPig/Images/Png/PngFromPdfImageFactory.cs#L31C17-L31C101
the PngBuilder.Create's hasAlphaChannel value is always false.

Another possible area to look into are "soft masks":
image

I doubt I will have time soon to look into that though

@tcm151
Copy link
Author

tcm151 commented Feb 20, 2024

I've cloned the project and built it with the change to PngBuilder.Create's hasAlphaChannel set to true, which does not appear to change the result of the images.

@BobLd
Copy link
Collaborator

BobLd commented Feb 21, 2024

@tcm151 okay, thanks for checking. Sad it's not an easy fix

It must be related to soft mask then... which is more tricky since I don't think it's fully implemented yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
document-reading Related to reading documents no-document Indicates no sample document attached to issue
Projects
None yet
Development

No branches or pull requests

3 participants