Order of attachments #822

vsenko · 2024-03-05T13:47:54Z

It looks like v0.7.0 sorts the attachments alphanumerically by file name (ID), but as long as technically attached files are ordered in PDF, it confuses. For example in PDF text attached files could be referenced not only by name, but also by their order.

Code to reproduce:

package main

import (
	"bytes"
	"fmt"
	"strings"

	pdfcpuapi "github.com/pdfcpu/pdfcpu/pkg/api"
	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu"
	pdfcpucreate "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/create"
	pdfcpumodel "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/model"
	pdfcputypes "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/types"
)

func main() {
	ctx, err := pdfcpu.CreateContextWithXRefTable(pdfcpumodel.NewDefaultConfiguration(), pdfcputypes.PaperSize["A4"])
	if err != nil {
		panic(err)
	}

	template := `{"pages": { "1": { "content": { "text": [{ "value": "page 1", "anchor": "left", "font": { "name": "Helvetica", "size": 12 } }] } } } }`
	err = pdfcpucreate.FromJSON(ctx, strings.NewReader(template))
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("a"), ID: "a", Desc: "a"}, false)
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("1"), ID: "1", Desc: "1"}, false)
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("z"), ID: "z", Desc: "z"}, false)
	if err != nil {
		panic(err)
	}

	err = ctx.AddAttachment(pdfcpumodel.Attachment{Reader: strings.NewReader("d"), ID: "d", Desc: "d"}, false)
	if err != nil {
		panic(err)
	}

	var b bytes.Buffer
	err = pdfcpuapi.WriteContext(ctx, &b)
	if err != nil {
		panic(err)
	}

	attachements, err := pdfcpuapi.ExtractAttachmentsRaw(bytes.NewReader(b.Bytes()), "", nil, nil)
	if err != nil {
		panic(err)
	}

	for _, a := range attachements {
		fmt.Println(a.ID)
	}
}

Expected output:

a
1
z
d

Actual output:

1
a
d
z

The text was updated successfully, but these errors were encountered:

hhrutter · 2024-03-06T09:35:48Z

That's because attachments are stored in a PDF EmbeddedFiles nametree.

vsenko · 2024-03-06T10:59:39Z

As far as I understand, elements of EmbeddedFiles have an order. Thus it would be convenient if the attached files would be places there in the same order as they've been added.

hhrutter · 2024-03-06T11:09:48Z

Nametree keys are sorted in lexical order.
I am not inclined to hacking the order into the key.

vsenko · 2024-03-06T14:04:48Z

After some research I now understand that Attachment.ID is not the file name, but the string that identifies the embedded file in EmbeddedFiles. What have been confusing me is that Attachment.FileName gets lost during attaching, here is the code to illustrate it:

package main

import (
	"bytes"
	"fmt"
	"os"
	"strings"

	pdfcpuapi "github.com/pdfcpu/pdfcpu/pkg/api"
	"github.com/pdfcpu/pdfcpu/pkg/pdfcpu"
	pdfcpucreate "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/create"
	pdfcpumodel "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/model"
	pdfcputypes "github.com/pdfcpu/pdfcpu/pkg/pdfcpu/types"
)

func main() {
	ctx, err := pdfcpu.CreateContextWithXRefTable(pdfcpumodel.NewDefaultConfiguration(), pdfcputypes.PaperSize["A4"])
	if err != nil {
		panic(err)
	}

	template := `{"pages": { "1": { "content": { "text": [{ "value": "page 1", "anchor": "left", "font": { "name": "Helvetica", "size": 12 } }] } } } }`
	err = pdfcpucreate.FromJSON(ctx, strings.NewReader(template))
	if err != nil {
		panic(err)
	}

	attachments := []pdfcpumodel.Attachment{
		{Reader: strings.NewReader("afile"), ID: "id1", FileName: "a.txt", Desc: "a-decs"},
		{Reader: strings.NewReader("1file"), ID: "id2", FileName: "1.txt", Desc: "1-decs"},
		{Reader: strings.NewReader("zfile"), ID: "id3", FileName: "z.txt", Desc: "z-decs"},
		{Reader: strings.NewReader("dfile"), ID: "id4", FileName: "d.txt", Desc: "d-decs"},
	}

	for _, a := range attachments {
		err = ctx.AddAttachment(a, false)
		if err != nil {
			panic(err)
		}
	}

	var b bytes.Buffer
	err = pdfcpuapi.WriteContext(ctx, &b)
	if err != nil {
		panic(err)
	}

	attachements, err := pdfcpuapi.ExtractAttachmentsRaw(bytes.NewReader(b.Bytes()), "", nil, nil)
	if err != nil {
		panic(err)
	}

	for _, a := range attachements {
		fmt.Println(a.ID, a.FileName, a.Desc)
	}
}

Expected output is:

id1 a.txt a-decs
id2 1.txt 1-decs
id3 z.txt z-decs
id4 d.txt d-decs

But the actual one is:

id1 id1 a-decs
id2 id2 1-decs
id3 id3 z-decs
id4 id4 d-decs

And actually if you analyze the constructed PDF, /F and /UF contain id1, not the file name. It happens here: https://github.com/pdfcpu/pdfcpu/blob/master/pkg/pdfcpu/model/attach.go#L122

return xRefTable.NewFileSpecDict(a.ID, a.ID, a.Desc, *sd)

a.ID gets passed as /F and /UF.

So this issue is actually not about the order of attachments, but about losing attachments files names.

hhrutter · 2024-03-07T15:23:21Z

Thanks I'll take a look.

vsenko added the feature request label Mar 5, 2024

vsenko assigned hhrutter Mar 5, 2024

hhrutter changed the title ~~It would be really handy not to rearrange the order of the attachments~~ Order of attachments Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order of attachments #822

Order of attachments #822

vsenko commented Mar 5, 2024

hhrutter commented Mar 6, 2024 •

edited

vsenko commented Mar 6, 2024

hhrutter commented Mar 6, 2024

vsenko commented Mar 6, 2024

hhrutter commented Mar 7, 2024

Order of attachments #822

Order of attachments #822

Comments

vsenko commented Mar 5, 2024

hhrutter commented Mar 6, 2024 • edited

vsenko commented Mar 6, 2024

hhrutter commented Mar 6, 2024

vsenko commented Mar 6, 2024

hhrutter commented Mar 7, 2024

hhrutter commented Mar 6, 2024 •

edited