Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading XMP data from MP3 files? #345

Open
Numpsy opened this issue Sep 10, 2023 · 6 comments
Open

Reading XMP data from MP3 files? #345

Numpsy opened this issue Sep 10, 2023 · 6 comments

Comments

@Numpsy
Copy link

Numpsy commented Sep 10, 2023

Hi,

Is there presently any means of reading XMP data out of MP3 files with metadata-extractor?

I believe that the XMP data is stored inside the ID3 data, and I see the comment at

// Eventually we should extract this data properly.

about reading ID3 data, so maybe it's not presently possible

@drewnoakes
Copy link
Owner

I've never heard of XMP within ID3. Do you have a reference or sample?

@Numpsy
Copy link
Author

Numpsy commented Oct 9, 2023

It's documented in the Adobe XMP SDK specs at https://github.com/adobe/XMP-Toolkit-SDK/blob/main/docs/XMPSpecificationPart3.pdf, section 1.2.5 (the Adobe C++ SDK has the ability to read and write it)

I've been having a go at doing a minimal set of read code for my purposes (where I could do with some managed reading code for corss platform use rather than juggling C++ code), I may or may not have time to try adding it on over here at some point.

@drewnoakes
Copy link
Owner

Thanks, that's really helpful.

1.2.5 MP3

MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular audio encoding format. MPEG stands
for Moving Picture Experts Group. The formal standard is ISO/IEC IS 11172-3, but this only covers the raw
audio aspects. The metadata in MP3 files uses the ID3v1 or ID3v2 format. When used with XMP, this must be
ID3v1, ID3v2, ID3v2.3 or ID3v2.4. The ID3v2.3 and ID3v2.4 formats are almost identical. The most notable
difference is that ID3v2.4 allows text values to be UTF-8, in addition to ISO 8859-1 (Latin-1) or UTF-16.
The entire ID3 portion of the MP3 file is called the ID3 "tag" (rather confusingly, given other media file and
metadata terminology). The individual metadata items are called ID3 "frames".

1.2.5.1 Placement of XMP

The XMP is placed within the ID3 as a "PRIV" frame with an Owner identifier of "XMP". The content of the XMP PRIV frame is the XMP packet, encoded as UTF-8.
MP3 files can contain native metadata; see detail of reconciliation with XMP in 2.3.3, “Native metadata in MP3”.
Specifications can be found at:

If you wanted to add this to the library, I'd happily support any PR to do so.

@Numpsy
Copy link
Author

Numpsy commented Feb 28, 2024

A question for the record, in case I get more time to try it - would something that just gets XMP out of ID3v2 tags work, or would it need to be something that reads more extensive data? (I'm sure there'd be scope to extend in the future though)

@drewnoakes
Copy link
Owner

Ideally we'd add an understanding of ID3 so that we can correctly pull XMP from within. Otherwise we're reduced to scanning for content that looks like XMP, which can be fraught with bugs.

@Numpsy
Copy link
Author

Numpsy commented Mar 18, 2024

At a really basic level, you can walk through the frames in the ID3 tag until you find a PRIV/XMP one, ignoring any others, and then stop if a match is found (or maybe with a bit more validation on the tag length / overall contents etc) - and understanding of more frame types could be added later if needed.

which can be fraught with bugs.

Yes, a problem with the packet scanning approach is that you might have embedded images and such that contain XMP of their own, and it'd more work to deal with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants