Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Order of attachments #822

Open
vsenko opened this issue Mar 5, 2024 · 5 comments
Open

Order of attachments #822

vsenko opened this issue Mar 5, 2024 · 5 comments
Assignees

Comments

@vsenko
Copy link

vsenko commented Mar 5, 2024

It looks like v0.7.0 sorts the attachments alphanumerically by file name (ID), but as long as technically attached files are ordered in PDF, it confuses. For example in PDF text attached files could be referenced not only by name, but also by their order.

Code to reproduce:

package main

import (
	"bytes"
	"fmt"
	"strings"

	pdfcpuapi "github.com/pdfcpu/pdfcpu/pkg/api"
	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu"
	pdfcpucreate "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/create"
	pdfcpumodel "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/model"
	pdfcputypes "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/types"
)

func main() {
	ctx, err := pdfcpu.CreateContextWithXRefTable(pdfcpumodel.NewDefaultConfiguration(), pdfcputypes.PaperSize["A4"])
	if err != nil {
		panic(err)
	}

	template := `{"pages": { "1": { "content": { "text": [{ "value": "page 1", "anchor": "left", "font": { "name": "Helvetica", "size": 12 } }] } } } }`
	err = pdfcpucreate.FromJSON(ctx, strings.NewReader(template))
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("a"), ID: "a", Desc: "a"}, false)
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("1"), ID: "1", Desc: "1"}, false)
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("z"), ID: "z", Desc: "z"}, false)
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("d"), ID: "d", Desc: "d"}, false)
	if err != nil {
		panic(err)
	}

	var b bytes.Buffer
	err = pdfcpuapi.WriteContext(ctx, &b)
	if err != nil {
		panic(err)
	}

	attachements, err := pdfcpuapi.ExtractAttachmentsRaw(bytes.NewReader(b.Bytes()), "", nil, nil)
	if err != nil {
		panic(err)
	}

	for _, a := range attachements {
		fmt.Println(a.ID)
	}
}

Expected output:

a
1
z
d

Actual output:

1
a
d
z
@hhrutter
Copy link
Collaborator

hhrutter commented Mar 6, 2024

That's because attachments are stored in a PDF EmbeddedFiles nametree.

@hhrutter hhrutter changed the title It would be really handy not to rearrange the order of the attachments Order of attachments Mar 6, 2024
@vsenko
Copy link
Author

vsenko commented Mar 6, 2024

As far as I understand, elements of EmbeddedFiles have an order. Thus it would be convenient if the attached files would be places there in the same order as they've been added.

@hhrutter
Copy link
Collaborator

hhrutter commented Mar 6, 2024

Nametree keys are sorted in lexical order.
I am not inclined to hacking the order into the key.

@vsenko
Copy link
Author

vsenko commented Mar 6, 2024

After some research I now understand that Attachment.ID is not the file name, but the string that identifies the embedded file in EmbeddedFiles. What have been confusing me is that Attachment.FileName gets lost during attaching, here is the code to illustrate it:

package main

import (
	"bytes"
	"fmt"
	"os"
	"strings"

	pdfcpuapi "github.com/pdfcpu/pdfcpu/pkg/api"
	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu"
	pdfcpucreate "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/create"
	pdfcpumodel "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/model"
	pdfcputypes "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/types"
)

func main() {
	ctx, err := pdfcpu.CreateContextWithXRefTable(pdfcpumodel.NewDefaultConfiguration(), pdfcputypes.PaperSize["A4"])
	if err != nil {
		panic(err)
	}

	template := `{"pages": { "1": { "content": { "text": [{ "value": "page 1", "anchor": "left", "font": { "name": "Helvetica", "size": 12 } }] } } } }`
	err = pdfcpucreate.FromJSON(ctx, strings.NewReader(template))
	if err != nil {
		panic(err)
	}

	attachments := []pdfcpumodel.Attachment{
		{Reader: strings.NewReader("afile"), ID: "id1", FileName: "a.txt", Desc: "a-decs"},
		{Reader: strings.NewReader("1file"), ID: "id2", FileName: "1.txt", Desc: "1-decs"},
		{Reader: strings.NewReader("zfile"), ID: "id3", FileName: "z.txt", Desc: "z-decs"},
		{Reader: strings.NewReader("dfile"), ID: "id4", FileName: "d.txt", Desc: "d-decs"},
	}

	for _, a := range attachments {
		err = ctx.AddAttachment(a, false)
		if err != nil {
			panic(err)
		}
	}

	var b bytes.Buffer
	err = pdfcpuapi.WriteContext(ctx, &b)
	if err != nil {
		panic(err)
	}

	attachements, err := pdfcpuapi.ExtractAttachmentsRaw(bytes.NewReader(b.Bytes()), "", nil, nil)
	if err != nil {
		panic(err)
	}

	for _, a := range attachements {
		fmt.Println(a.ID, a.FileName, a.Desc)
	}
}

Expected output is:

id1 a.txt a-decs
id2 1.txt 1-decs
id3 z.txt z-decs
id4 d.txt d-decs

But the actual one is:

id1 id1 a-decs
id2 id2 1-decs
id3 id3 z-decs
id4 id4 d-decs

And actually if you analyze the constructed PDF, /F and /UF contain id1, not the file name. It happens here: https://github.com/pdfcpu/pdfcpu/blob/master/pkg/pdfcpu/model/attach.go#L122

return xRefTable.NewFileSpecDict(a.ID, a.ID, a.Desc, *sd)

a.ID gets passed as /F and /UF.

So this issue is actually not about the order of attachments, but about losing attachments files names.

@hhrutter
Copy link
Collaborator

hhrutter commented Mar 7, 2024

Thanks I'll take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants