Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors in input data and legacy conversion #17

Open
fedorov opened this issue Feb 22, 2020 · 5 comments
Open

Errors in input data and legacy conversion #17

fedorov opened this issue Feb 22, 2020 · 5 comments
Labels
question Further information is requested

Comments

@fedorov
Copy link
Member

fedorov commented Feb 22, 2020

While evaluating legacy sop classes, @afshinmessiah and I ran into the not unexpected issues related to invalid input data. At least in some cases, those errors are rather trivial, such as mismatch of VR between SH and CS.

This raised the discussion with @dclunie below. Even after this discussion, I personally think it would make more sense and would be more practical to try to fix issues as they come up in the process of legacy conversion:

  • this will be easier to understand and use for users trying to use legacy conversion functionality
  • it may be easier to correct only errors that are important for legacy conversion specifically, rather than develop a tool that tries to fix all errors (e.g., not all of the attributes will be used for legacy conversion)
  • if we first patch the input datasets, and then do the conversion, we would need to reference the ephemeral patched datasets, if we want to do things right

@hackermd did you think about this?


From: David Clunie
Date: Thu, Feb 20, 2020 at 10:33 AM
Subject: Re: MF conversion and source dataset errors
To: Andrey Fedorov
Cc: Akbarzadeh, Afshin, Steve Pieper

Hi Andrey

I also copied Steve.

In short, probably option 2 (patch the source dataset, and then
do the conversion).

It depends a lot on exactly what the "errors" are, and what
you would do with any intermediate files.

E.g., if there is an error that a value is invalid for the VR,
(e.g., a bad character or too long), and the data element is
being copied (either into the top level data set of the new
object or into a functional group, e.g., per-frame unassigned),
then the choice is to "fix" it (remove bad character, truncate
too long string) before copying.

Alternatively, if it is an optional attribute, one could just
drop it (not copy it); but that risks losing something useful.

I don't always bother fixing these when converting in bulk, and
just propagate the errors, since trying to find and fix each special
case may be more work than I can justify.

But if one can make a fix, it would be nice to.

There is also an update to the standard that allows saving the
original bad values; see CP 1766 and Nonconforming Modified
Attributes Sequence:

  ftp://medical.nema.org/medical/dicom/final/cp1766_ft_ExtendOriginalAttributesSequence.pdf

  http://dicom.nema.org/medical/dicom/current/output/chtml/part03/sect_C.12.html#sect_C.12.1.1.9.2

In terms of "when" to do the fix, if you are going to fix things,
I have done it both ways (and sometimes another way, which is
to propagate the errors into the multi-frame, and then fix
them up in a yet another separate final cleanup step).

I assume that when you say "patch the source dataset", you mean
fix a temporary copy, not "return a fixed copy to TCIA to replace
their bad stuff".

In which case, either approach (or a combination of both) seems
fine to me, since any intermediate file don't need to be persisted.

In the past, when creating "true" enhanced MF samples for CT and
MR (for the NEMA sponsored project), I actually used my "antodc"
tool in dicom3tools to "fix" and "enhance" the single frame objects,
by converting stuff from private attributes to standard attributes
(even if they weren't in the single frame IOD), and then handled
the "merging" into multi-frame (and factoring out of shared stuff)
in dcmulti.

This worked well because I separated most of the modality-specific
knowledge from the generic single to multi-frame conversion, as well
as providing a useful tool for making single frame images "better",
when I didn't need to make multi-frame ones.

This was long before I added the MultiframeImageFactory to the
PixelMed toolkit, and I have propagated very little if any of the
modality-specific stuff to that tool so far.

When/if I revisit the question of trying to create modality-
specific legacy converted or true enhanced multi-frame images
in PixelMed, I will very likely use the two step process of
first fixing the single frame headers, and then merging them
into a multi-frame, since I find that division of labor more
elegant.

It would also allow me to provide other input sources (e.g.,
NIfTI files with a BIDS metadata file) to feed the same
DICOM enhanced multi-frame pipeline,. Though I have to admit
that I usually do that sort of thing with separate classes
with methods applied successively, rather than separate distinct
command line utilities.

BTW. For referential integrity updates (e.g., fixing all SRs
or SEGs that reference single frame images to reference the
new MF objects), I would might make that yet another separate
step in the pipeline, especially if I could find other uses
for it.

David

PS. I have attached a copy of antodc.cc from dicom3tools ... I
haven't used this tool for more than a decade, but it may give
you some insight into more complicated fixes I sometimes used
to perform, and how I extracted standard information from private
data elements.

PPS. In case they are informative, I have attached archives of the
Makefiles that I used for the CT and MR NEMA project ... these will
not execute without my source images, various tools and out of band
information, but they may give some (outdated) insight into the
process of handcrafting examples versus producing an operational
converter.

On 2/19/20 11:18 PM, Andrey Fedorov wrote:

Hi David,

As Afshin is working on the MF conversion task, we wanted to ask you a

"fundamental" question.

As you know, TCIA datasets may often have errors. What should be the
strategy for addressing those? Should we:

  1. carry forward those errors into MF representation, and just ignore
    those while validating MF?
  2. patch the source dataset, and then do the conversion?
  3. patch the errors in the MF representation in the process of
    conversion, and keep the originals intact?

I would probably prefer option 3.

AF

@fedorov fedorov added the question Further information is requested label Feb 22, 2020
@hackermd
Copy link
Collaborator

This is a tricky question. Generally, I would argue that the library has to assume passed data sets are standard compliant and I would be cautious about just "fixing" things magically. The problem with this approach is that users will keep thinking that there is no problem with the data sets they provided and will be surprised if things break elsewhere.

On the one hand, I think it would be reasonable to encourage the user to correct the input data set if there are any "errors" in the input single-frame data sets that raise an exception upon conversion. Note that this would not necessarily require the end user to change the original images (files), but the application that uses the highdicom library could apply these changes at Python runtime before passing the data sets to the constructor of the Legacy Converted Enhanced MR/CT/PET Image class. However, it may ultimately not be clear how attributes of the output (multi-frame legacy converted enhanced object) relate to attributes of the inputs (single-frame legacy objects).

On the other hand, I see the opportunity for legacy conversion to further enhance data sets by addressing compliance issues and the Nonconforming Modified Attributes Sequence would be an elegant way to do this in a transparent manner.
However, if we decide to correct compliance issues internally, we should at least log an ERROR message to indicate to the user that there was a problem with the input data. The log output could serve users as a conversion report.

Personally, I favor the second approach even though the would mean that the library takes on more responsibility.

@dclunie
Copy link

dclunie commented Feb 22, 2020 via email

@pieper
Copy link
Member

pieper commented Feb 24, 2020

For reference, Slicer provides a DICOMPatcher module that handles a number of common issues. This is another set of code (like the plugins) that we could factor out into highdicom for more general use.

@fedorov
Copy link
Member Author

fedorov commented Feb 24, 2020

Thanks for mentioning DICOMPatcher Steve. This reminded me we already had this discussion in the context of dcmqi development. Here are related issues. DCMTK has to deal with a similar situation creating SEGs while carrying forward composite context from the referenced objects, we had quite some discussions about this with @michaelonken:

I need to think about the discussions above!

@fedorov
Copy link
Member Author

fedorov commented Feb 26, 2020

Another related component to be mentioned here is https://github.com/innolitics/dicom-standard, which can be used to identify errors that should be patched. @afshinmessiah is going to look into this, Slicer DICOMPatcher and @dclunie tools, and we will think how to proceed. @hackermd maybe after that we should discuss if that validator/patcher belongs to highdicom, or should be a separate tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants