Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New API
This pull request is exploring an improved API within the library the library.
Discussion about the API should happen here, and this description will be updated to reflect what's agreed upon.
Desired qualities & features
Easier to update properties
Currently, adding a property requires editing four disparate locations in code:
We could lose the concept of descriptor classes altogether.
A new API should co-locate all this in one place. For example:
Property metadata (or, metadata about metadata)
We can model property metadata, such as:
Metadata varies depending upon the property type.
Support varied tag identifiers
Directory
as a base class requires all tag identifiers to be integers. That's fine for Exif and other TIFF formats that use numeric identifiers for tags, but not all file formats have that. Instead we end up defining our own arbitrary integer mapping.XMP in particular doesn't fit this mould, and we don't support it very well. XMP properties have two keys -- a namespace and a key. We should support composite keys such as this.
A likely implementation will have a directory base class that's generic on its key type.
We still want to easily enumerate all properties and print out their descriptions. Key types must have some common base type from which a key string can be obtained.
Top level object
The .NET project does not have a
Metadata
class, instead usingIEnumerable<Directory>
. C# provides great operators (OfType<>
,First
,Select
,SelectMany
, ...), and allows extension methods on this interface.Data driven approach
Rather than defining all properties in code, they should be loaded from a configuration file.
The configuration file would be reused between the Java and C# implementations. It should simplify the creation of implementations in other languages as well.
Implementations could provide partial support for the metadata types described in the data file, allowing gradual implementation. We could automatically generate documentation/tabulation of support across implementations.
Exiftool's Perl source code reads a lot more like data than code. This data file should be equally declarative.
Logical vs. physical properties
As discussed in drewnoakes/metadata-extractor#10, it'd be valuable to allow simpler access to logical values that may have multiple possible physical locations.
Examples of such logical properties: Timestamp, Orientation, CameraMake, CameraModel, Aperture, Exposure, Flash, FocalLength, ISO, WhiteBalance, ImageSize, GeoLocation, Altitude, Heading, ThumbnailSize, LensModel, DriveMode, ExposureMode, ExposureProgram, Rating, Subject, Label, Copyright, Author, Comment, ImageCount.
Some could be sourced from many locations (Timestamp, ISO, Flash...) and others which are combinations of multiple tags (ImageSize, GeoLocation, ThumbnailSize, ...).
Efficient storage
Some formats use fixed length records. For these, a directory could store the single
byte[]
and useIndexedProperty
methods to read/write values at runtime.Context
Some kind of object that configures how metadata extraction is completed. It could cache the parsed data file (see above), specify filtering options (see below), configuration such as threshold limits on byte[] sizes, which metadata types to extract. If we ever need runtime code generation, it could cache those resources too.
Filtering
It seems useful to be able to limit the types of metadata returned during processing, to reduce heap allocation and reduce IO/CPU usage. There's a PR for this in the Java implementation, and some discussion there.
Serialisation
It might be good to support hooks for serialisation and deserialisation in arbitrary formats. There's a PR in the Java version that uses Java's object serialisation, but a more general approach should support XML, JSON, etc.
Future support for editing metadata
This is a very sought after feature, but it's a big commitment as the cost per error is high, and it will require a great deal of engineering.
So while it's not a v-next goal explicitly, it will likely be the next significant milestone for the project, and we should at least give it some thought when it comes to this iteration of the API. We should consider trying to minimise future API churn.
Naming
The data model (directories, tags) dates back to when the project was called ExifExtractor. The terms come from the TIFF specification.