Skip to content

Integration of auxiliary data (metadata) in models

Latest
Compare
Choose a tag to compare
@mpelchat04 mpelchat04 released this 06 Nov 14:19

Overview

  • Add support for auxiliary data (metadata) injection into models, as context information. Metadata includes any value contained in in a metadata file (currently in a yaml format). The metadata key has to be linked using the meta_map key in the config (yaml) file. Detailed information on this feature will be shortly provided in the README.
  • Add support for auxiliary vector data injection into models. Auxiliary vector data can be provided by the user and are converted to distance map (log) at training and inference time.
  • Add support for CoordConv at train and inference time.

Detailed view

  • adjust scripts to fix runtime errors
  • add tqdm requirement
  • Fixup bg percent check in no-bg cases
  • Fixup non-bg/bg sample percent check
  • Fixup sample script utils import for consistency
  • Update sample class count check to remove extra data loop
  • Update sample dataset resize func w/ generic version
  • Add missing debug/scale to main test config
  • Remove hardcoded sample counts from main test cfg
  • Add missing loss/optim/ignoreidx vals to main test cfg
  • Move sample metadata fields to parallel hdf5 datasets

The previous implementation would overwrite the metadata attributes each time a new raster was parsed; this version allows multiple versions to exist in parallel. The metadata itself is tied to each sample using an index that
corresponds to the position of the metadata string in the separate dataset. This implementation also stores the entire raster YAML metadata dict as a single string that may be eval'd and reinstantiated as needed at runtime.

  • Remove top-level trn/val/tst split config
  • Update gitignore to add jetbrains solution dir
  • Remove useless class weight vector from test config
  • Update segmentation dataset interface to load metadata
  • Add metadata unpacking in segm dataset interface
  • Add gitignore entry for pycache dir
  • Fix parameter check to support zero-based values

The previous implementation did not allow null values to actually be assigned to some non-null default hyperparameters. For example, when the 'ignore_index' was set to '0' (which is totally valid), it would
be skipped and the default value of '-100' would remain.

  • Update hdf5 label map dtypes to int16
  • Add coordconv layers & utils in new module
  • Add metadata-enabled segm dataset parsing interface
  • Add util function for fetching vals in a dictionary
  • Update model_choice to allow enabling coordconv via config
  • Cleanup dataset creation util func w/ subset loop
  • Refactor image reader & vector rasterizer utilities

The current version of these functions is now more generic than before. The rasterization utility function (vector_to_raster) is now located in the 'utils' package and supports the burning of vectors into separate layers as well as in the same layer (which is the original behavior). The new multi-layer behavior is used in the updated 'image_reader_as_array' utility function to (optionally) append new layers to raw imagery.

The refactoring also allowed the cleanup of the 'assert_band_number' utility function, and simplification of the code in both the inference script (inference.py') and dataset preparation script ('image_to_samples.py').

  • Update meta-segm dataset parser to make map optional
  • Cleanup SegmDataset to clearly only handle zero-dontcare differently
  • Refactor 'create_dataloader' function in training script

The current version now inspects the parameter dictionary to see if a 'meta_map' is provided. If so, the segmentation dataset parser will be replaced by its upgraded version that can append extra (metadata) layers onto loaded tensors based on that predefined mapping.

The refactoring also now includes the 'get_num_samples' call directly into the 'create_dataloader' function.

  • Update create_dataloader util to force-fix dontcare val
  • Update read_csv to parse metadata config file with raster path

The current version now allows a metadata (yaml) file to be associated with each raster file that will be split into samples. The previous version only allowed a global metadata file to be parsed.

  • Cleanup package imports & add missing import to utils
  • Refactor meta-segm-dataset parser to expose meta layer append util
  • Move meta_map param from training cfg to global cfg
  • Add meta-layer support to inference.py
  • Move meta-layer concat to proper location in inference script
  • Update meta-enabled config for unet tests
  • Move meta-segm cfg to proper dir & add coordconv cfg
  • Update csv column count check to allow extras
  • Update i2s and inf band count checks to account for meta layers
  • Fixup missing meta field in csv parsing output dicts
  • Fixup band count in coordconv ex config
  • Fixup image reader util to avoid double copies
  • Cleanup vector rasterization utils & recursive key getter
  • Update aux distmap computing to make target ids optional & add log maps
  • Add canvec aux test config and cleanup aux params elsewhere
  • Add download links for external (non-private) files
  • Re-add previously deleted files from default gdl data dir
  • Update i2s/train/inf scripts to support single class segm
  • Fixup gpu stats display when gpu is disabled
  • Add missing empty metadata fields in test CSVs
  • Fixup improper device upload in classif inference
  • Update travis to use recent pkgs in conda-forge