Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange Property Names if mixing exiftool and Windows Explorer for http://ns.microsoft.com/photo/1.0/ XMP tags #271

Open
mkvonarx opened this issue Sep 21, 2020 · 6 comments

Comments

@mkvonarx
Copy link

Summary: I get strange property names in the http://ns.microsoft.com/photo/1.0/ XMP namespace if mixing JPEGs that have their XMP metadata changed by exiftool and Windows Explorer. Sometimes metadata-extractor prints "MicrosoftPhoto:LastKeywordXMP" and sometimes "MicrosoftPhoto_1_:LastKeywordXMP" (dito for "MicrosoftPhoto:Rating" and probably others).

This seems to be caused by exiftool and WindowsExplorer using different namespaces. exiftool uses "http://ns.microsoft.com/photo/1.0" (without final /), Windows Explorer uses "http://ns.microsoft.com/photo/1.0/". It seems that metadata-extractor tries to make XMP property names unique if it encounters situations with both namespaces and the same property. I could not reproduce this issue when loading the metadata for one single JPEG file. It only happens after loading at least 3 files.

Output of

foreach (var property in xmpDirectory.XmpMeta.Properties) {
	Console.WriteLine($"path={property.Path} => value={property.Value}, namespace={property.Namespace}");
}

Correct:

path=MicrosoftPhoto:Rating => value=75, namespace=http://ns.microsoft.com/photo/1.0
path=MicrosoftPhoto:LastKeywordXMP => value=, namespace=http://ns.microsoft.com/photo/1.0
path=MicrosoftPhoto:LastKeywordXMP[1] => value=WinExplorerTag1, namespace=
path=MicrosoftPhoto:LastKeywordXMP[2] => value=WinExplorerTag2, namespace=

Not so correct:

path=MicrosoftPhoto_1_:Rating => value=25, namespace=http://ns.microsoft.com/photo/1.0/
path=MicrosoftPhoto_1_:LastKeywordXMP => value=, namespace=http://ns.microsoft.com/photo/1.0/
path=MicrosoftPhoto_1_:LastKeywordXMP[1] => value=WinExplorerTag1, namespace=
path=MicrosoftPhoto_1_:LastKeywordXMP[2] => value=WinExplorerTag2, namespace=

(the mixup can also happen the other way round, depending on which order the files were loaded)

I changed the photos either directly in the details pane of the Windows 10 File Explorer, or with exiftool using the following command: exiftool.exe -XMP-microsoft:RatingPercent=25 -XMP-microsoft:LastKeywordXMP+=exiftool-MSKeyword1 -XMP-microsoft:LastKeywordXMP+=exiftool-MSKeyword2 photo-exiftool.jpg

I don't know who's at fault here (exiftool, Windows Explorer, metadata-extractor). In the end, it doesn't matter. A generic metadata reader should be quite flexible and also work with slightly incorrect/unusual metadata when reading photo metadata, as many writing tools don't closely follow the (not so well defined) standards.

Do you think this can be fixed in metadata-extractor, so that no matter the slighty different XML namespaces, the property names will always be the same?

(I'm actually not sure regarding URL equality, of two URLs that only differ in the trailing slash should be considered equal or not)

@drewnoakes
Copy link
Owner

It's always easier to diagnose these kinds of problems if you include a sample file.

If you open the file in a hex editor, do you see MicrosoftPhoto or MicrosoftPhoto_1_?

@mkvonarx
Copy link
Author

I've tested this with three jpeg files. The XML in the files is as follows:

  • Test-exiftool.jpg (changed only by exiftool):
<rdf:Description rdf:about=''
  xmlns:MicrosoftPhoto='http://ns.microsoft.com/photo/1.0'>
  <MicrosoftPhoto:LastKeywordXMP>
   <rdf:Bag>
    <rdf:li>exiftool-MSKeyword1</rdf:li>
    <rdf:li>exiftool-MSKeyword2</rdf:li>
   </rdf:Bag>
  </MicrosoftPhoto:LastKeywordXMP>
  <MicrosoftPhoto:Rating>25</MicrosoftPhoto:Rating>
 </rdf:Description>
  • Test-mixed.jpg (changed first by Windows Explorer, then by exiftool):
<rdf:Description rdf:about='uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b'
  xmlns:MicrosoftPhoto='http://ns.microsoft.com/photo/1.0'>
  <MicrosoftPhoto:LastKeywordXMP>
   <rdf:Bag>
    <rdf:li>WinExplorerTag1</rdf:li>
    <rdf:li>WinExplorerTag2</rdf:li>
   </rdf:Bag>
  </MicrosoftPhoto:LastKeywordXMP>
  <MicrosoftPhoto:Rating>75</MicrosoftPhoto:Rating>
 </rdf:Description>
  • Test-WindowsExplorer.jpg (changed only by Windows Explorer):
<rdf:Description rdf:about="uuid:faf5bdd5-ba3d-11da-ad31-d33d75182f1b" xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0/">
	<MicrosoftPhoto:Rating>25</MicrosoftPhoto:Rating>
	<MicrosoftPhoto:LastKeywordXMP>
		<rdf:Bag xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
			<rdf:li>WinExplorerTag1</rdf:li>
			<rdf:li>WinExplorerTag2</rdf:li>
		</rdf:Bag>
	</MicrosoftPhoto:LastKeywordXMP>
</rdf:Description>

So there's nothing like MicrosoftPhoto_1_ in the jpeg files themselves. And you see the difference in have a final / or not, depending on the tool.

A new interesting information I found is this:

  • If I open only one of those jpegs in my program, then exit and restart the program (process), metadata-extractor displays correct values for all three jpegs.
  • If I open these three jpegs one after each other in the same process, the third one (Test-WindowsExplorer.jpg) shows the strange path=MicrosoftPhoto_1_.
  • So this seems to be something internal in metadata-extractor. It looks as if when metadata-extractor has already encountered the namespace http://ns.microsoft.com/photo/1.0 and maps it to path=MicrosoftPhoto, metadata-extractor somehow caches this information, and when encountering the namespace http://ns.microsoft.com/photo/1.0/ in another file, it asssigns a new path=MicrosoftPhoto_1_ to the namespace with the final /.

mkvonarx added a commit to mkvonarx/MetadataExtractorIssue271 that referenced this issue Dec 11, 2020
@mkvonarx
Copy link
Author

I've created a repo with an example how to reproduce this:

  • https://github.com/mkvonarx/MetadataExtractorIssue271
  • just run run_tests.bat and watch the output; you should see the strange path values (you'll need to have the .NET 5 SDK installed to run my example; it also works and creates the same outputs with .NET Core 3.1)
  • I only included Test-exiftool.jpg and Test-WindowsExplorer.jpg, as two jpegs are enough to cause the issue

@paperboyo
Copy link

paperboyo commented Dec 12, 2020

Could this be at all related to drewnoakes/metadata-extractor#435? (a blind shoot this, so ignore if unhelpful)

@mkvonarx
Copy link
Author

mkvonarx commented Dec 12, 2020

@paperboyo Thanks for pointing me to the other issues. Indeed that helped me understanding the issue a lot better.

  1. Yes, when calling XmpCore.XmpMetaFactory.Reset(); each time before ImageMetadataReader.ReadMetadata(...), the issue described above does not show anymore. But because the documentatoin of Reset() says "should be used only for testing", I don't think that's a valid option (for me).
  2. When printing all values of XmpCore.XmpMetaFactory.SchemaRegistry.Namespaces after reading a JPEG File, I see that after having read 2 JPEG files in the same process with different MicrosoftPhoto namespace URLs (with and without final /), the SchemaRegistry contains the following two entries (depending on the order of the file loading):
  3. Out of curiosity, I tried to call XmpCore.XmpMetaFactory.SchemaRegistry.RegisterNamespace() myself twice, once with and once without the final / in the namespace. And indeed, RegisterNamespace() returns "MicrosoftPhoto_1_:" the second time. Diving a bit into the source code, which is in xmp-core-dotnet and not in metadata-extractor-dotnet, I found that XmpSchemaRegistry.RegisterNamespace() actually contains code to add the _1_ suffix to the suggested prefix if the suggested prefix is already in use here https://github.com/drewnoakes/xmp-core-dotnet/blob/master/XmpCore/Impl/XmpSchemaRegistry.cs#L79
  4. To summarize, the situation looks like this to me:
      1. Microsoft Explorer and exiftool don't use the same namespace. The "fault" seems to be in the exiftool implementation, as I think those XMP namespaces must end with an /.
      1. metadata-extractor-dotnet registers the namespaces it sees at XmpSchemaRegistry.RegisterNamespace() and gets two different prefixes for the two different-by-final-/ namespaces, because XmpSchemaRegistry wants all namespaces and all prefixes to be unique and map to each other 1:1
  5. I see the following possible solutions.
      1. exiftool or Windows Explorer are fixed. I think this very unlikely in both cases... There is actually a discussion about this difference on the exiftool forums from 2016 (https://exiftool.org/forum/index.php?topic=7324.msg37119) where Phil Harvey (author of exiftool) says that Microsoft originally used an URL without the slash, then changed their ways to using a slash, and that exiftool uses the original URL without the slash and while exiftool supports reading both URLs, it will continue writing the URL without slash because that was the original definition.
      1. The XmpSchemaRegistry in xmp-core-dotnet is enhanced to allow multiple namespaces behind the same prefix. I think this solution unlikely and probably not a good idea, as XmpSchemaRegistry seems to be designed for a 1:1 mapping and all potential clients of that class will expect this current behavior.
      1. metadata-extractor-dotnet adds some special case handling when loading XMP namespaces for http://ns.microsoft.com/photo/1.0 and internally always adds the final / if it sees it without the /. This would nicely fix my use case. And it would mimic the solution used by exiftool. The potential for breaking changes / side effects is maybe rather small.
      1. I change my code to always look for all 4 possible namespace/prefix combinations for all MicrosoftPhoto XMP tags.
      • Maybe xmp-core-dotnet could add some nicer accessors to its XMP data than IXmpMeta.GetProperty(string schemaNs, string propName) ;-) which would allow querying for just the name ("Rating" in my case) without having to provide the namespace and the prefix.

@kwhopper
Copy link
Collaborator

I've looked through the XmpMeta class to determine whether it needs the SchemaRegistry singleton... and for certain operations it does depend on its existence, including even the GetProperty calls you make now. This is true even after XmpMeta object creation through XmpReader parsing is completed.

Some thoughts on your list of possible solutions:

a) The specs all say namespaces must be unique, even down to capitalization and whitespace. In essence, Phil Harvey is saying they didn't change it; they actually made a new one. It might be worth bringing it up again to see if he has new thoughts on the issue.

b) I agree with you. This is not a good idea since it goes against spec.

c) metadata extractor does not have this capability. All XMP parsing is offloaded to XmpMetaFactory in XmpCore. We would have to change it there. Even then, it wouldn't match with the Java implementation for reasons I'll add on at the end.

d) I think this is your best bet - look for all combinations. The schema/namespace is a must-have in XMP to separate same-named properties.

This leaves us in a bind for this issue and I don't see a good way out. Keep in mind that xmp-core-dotnet is a C# port of XmpCore for Java from Adobe. If we come up with a 'fix' for the C# version (or want to make any substantial additions as you suggest), it cannot be changed in the Java version without help from Adobe.

However, if @drewnoakes sees some other path, we can make XmpCore dotnet changes independent of Adobe. That will lead to some output drift on the Java side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants