Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Errors for Imaging Data Commons #141

Closed
psavery opened this issue Dec 18, 2023 · 7 comments · Fixed by #149
Closed

HTTP Errors for Imaging Data Commons #141

psavery opened this issue Dec 18, 2023 · 7 comments · Fixed by #149

Comments

@psavery
Copy link
Contributor

psavery commented Dec 18, 2023

The NCI's Imaging Data Commons is a big repository (>38k studies) for cancer research.

Their studies are available via DICOMweb. See here for an example of viewing one of them.

I tried to access that same example, but I get a couple of HTTP errors. Try out the following code:

from wsidicom import WsiDicom, WsiDicomWebClient

# For this one: https://viewer.imaging.datacommons.cancer.gov/slim/studies/2.25.227261840503961430496812955999336758586/series/1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0
url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)

It produces the following exception:

requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID

I played around with that url with curl and found a couple of errors. Accessing that full URL via this command:

curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID'

Produces:

[{
  "error": {
    "code": 400,
    "message": "invalid QIDO-RS query: unknown/unsupported QIDO attribute: AvailableTransferSyntaxUID",
    "status": "INVALID_ARGUMENT"
  }
}
]

So it is raising an exception because we asked for the AvailableTransferSyntaxUID. However, if I remove that part of the url:

curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

It raises another error:

[{
  "error": {
    "code": 400,
    "message": "generic::invalid_argument: SOPClassUID is not a supported instance or series level attribute",
    "status": "INVALID_ARGUMENT"
  }
}
]

So it is also complaining that we are asking for a SOPClassUID of WSI_SOP_CLASS_UID in the search filters.

If I remove that part of the url also:

curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

It works fine.

It's kind of annoying that an error is being raised for AvailableTransferSyntaxUID. It would be nice if it just returned an empty field if it was not available.

However, it would be really nice if we could support interacting with this DICOMweb server.

Let me know what your thoughts are, @erikogabrielsson.

@erikogabrielsson
Copy link
Collaborator

Hi @psavery,
Do you know what DICOM server they are running? SOP class UID should be possible to use as a matching attribute, see Table Table 10.6.1-5. Required Matching Attributes

@psavery
Copy link
Contributor Author

psavery commented Jan 12, 2024

I see that... hmm.

My reading from their forums seems to indicate they are using a "Google Cloud Healthcare DICOMWeb service". They put it behind a proxy to prevent full downloads. See here.

psavery added a commit to psavery/wsidicom that referenced this issue Jan 12, 2024
After I made these changes, I could view a dataset using our viewer.

For example:

```python
from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.25644321580420796312527343668921514374'
series_uid = '1.3.6.1.4.1.5962.99.1.3205815762.381594633.1639588388306.2.0'

client = WsiDicomWebClient.create_client(url)
slide = WsiDicom.open_web(client, study_uid, series_uid)
```

I had to comment out the use of `AvailableTransferSyntaxUID` and `SOPClassUID`
because they were producing these errors, respectively:

```bash
"invalid QIDO-RS query: unknown/unsupported QIDO attribute: AvailableTransferSyntaxUID"
"generic::invalid_argument: SOPClassUID is not a supported instance or series level attribute"
```

I had to comment out the annotation parts because I was getting this error:
```python
Traceback (most recent call last):
  File "/opt/large_image/sources/dicom/large_image_source_dicom/__init__.py", line 152, in __init__
    self._dicom = self._open_wsi_dicom(self._largeImagePath)
  File "/opt/large_image/sources/dicom/large_image_source_dicom/__init__.py", line 183, in _open_wsi_dicom
    return self._open_wsi_dicomweb(path)
  File "/opt/large_image/sources/dicom/large_image_source_dicom/__init__.py", line 209, in _open_wsi_dicomweb
    return wsidicom.WsiDicom.open_web(wsidicom_client, study_uid, series_uid)
  File "/opt/wsidicom/wsidicom/wsidicom.py", line 153, in open_web
    source = WsiDicomWebSource(
  File "/opt/wsidicom/wsidicom/web/wsidicom_web_source.py", line 146, in __init__
    AnnotationInstance.open_dataset(annotation_instance)
  File "/opt/wsidicom/wsidicom/graphical_annotations.py", line 1653, in open_dataset
    if dataset.AnnotationCoordinateType == "2D":
  File "/opt/venv/lib/python3.9/site-packages/pydicom/dataset.py", line 908, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'Dataset' object has no attribute 'AnnotationCoordinateType'
```

See imi-bigpicture#141

Signed-off-by: Patrick Avery <[email protected]>
@psavery
Copy link
Contributor Author

psavery commented Jan 12, 2024

I am able to view an example dataset using our viewer if I make the changes here (although I know those are not changes that would be merged).

psavery added a commit to psavery/wsidicom that referenced this issue Jan 30, 2024
This change adds support for Google Healthcare API DICOMweb servers, such
as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons).

The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a
field, or if `SOPClassUID` is used as a search filter.

The `SOPClassUID` should definitely be allowed as an instance-level search
filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html).
However, this has apparently been a long-standing problem of nearly four
years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))),
so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data
Commons may not update their software anytime soon. It would be highly
advantageous to support such a large DICOMweb repository by working around
the issue.

The fix in this PR is as follows:

1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors.
2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows:
a) `AvailableTransferSyntaxUID` is simply removed, if present.
b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned).

These changes shouldn't have any impact on any situations except where an error
occurs from a Google Healthcare API server. And in that case, the function calls
are patched and then work properly.

The following example works after this fix:

```python
from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)
```

Fixes: imi-bigpicture#141

Signed-off-by: Patrick Avery <[email protected]>
psavery added a commit to psavery/wsidicom that referenced this issue Jan 30, 2024
This change adds support for Google Healthcare API DICOMweb servers, such
as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons).

The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a
field, or if `SOPClassUID` is used as a search filter.

The `SOPClassUID` should definitely be allowed as an instance-level search
filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html).
However, this has apparently been a long-standing problem of nearly four
years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))),
so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data
Commons may not update their software anytime soon. It would be highly
advantageous to support such a large DICOMweb repository by working around
the issue.

The fix in this PR is as follows:

1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors.
2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows:
a) `AvailableTransferSyntaxUID` is simply removed, if present.
b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned).

These changes shouldn't have any impact on any situations except where an error
occurs from a Google Healthcare API server. And in that case, the function calls
are patched and then work properly.

The following example works after this fix:

```python
from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)
```

Fixes: imi-bigpicture#141

Signed-off-by: Patrick Avery <[email protected]>
psavery added a commit to psavery/wsidicom that referenced this issue Jan 31, 2024
This change adds support for Google Healthcare API DICOMweb servers, such
as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons).

The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a
field, or if `SOPClassUID` is used as a search filter.

The `SOPClassUID` should definitely be allowed as an instance-level search
filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html).
However, this has apparently been a long-standing problem of nearly four
years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))),
so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data
Commons may not update their software anytime soon. It would be highly
advantageous to support such a large DICOMweb repository by working around
the issue.

The fix in this PR is as follows:

1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors.
2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows:
a) `AvailableTransferSyntaxUID` is simply removed, if present.
b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned).

These changes shouldn't have any impact on any situations except where an error
occurs from a Google Healthcare API server. And in that case, the function calls
are patched and then work properly.

The following example works after this fix:

```python
from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)
```

Fixes: imi-bigpicture#141

Signed-off-by: Patrick Avery <[email protected]>
erikogabrielsson pushed a commit that referenced this issue Feb 12, 2024
This change adds support for Google Healthcare API DICOMweb servers, such
as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons).

The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a
field, or if `SOPClassUID` is used as a search filter.

The `SOPClassUID` should definitely be allowed as an instance-level search
filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html).
However, this has apparently been a long-standing problem of nearly four
years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))),
so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data
Commons may not update their software anytime soon. It would be highly
advantageous to support such a large DICOMweb repository by working around
the issue.

The fix in this PR is as follows:

1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors.
2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows:
a) `AvailableTransferSyntaxUID` is simply removed, if present.
b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned).

These changes shouldn't have any impact on any situations except where an error
occurs from a Google Healthcare API server. And in that case, the function calls
are patched and then work properly.

The following example works after this fix:

```python
from wsidicom import WsiDicom, WsiDicomWebClient

url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb'
study_uid = '2.25.227261840503961430496812955999336758586'
series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'

client = WsiDicomWebClient.create_client(url)

slide = WsiDicom.open_web(client, study_uid, series_uid)
```

Fixes: #141

Signed-off-by: Patrick Avery <[email protected]>
@fedorov
Copy link

fedorov commented Feb 16, 2024

@psavery I am Andrey Fedorov, one of the leads of Imaging Data Commons.

First of all, thank you for using IDC, it is great to hear that you found it useful! Feedback from users like you and others in the wsidicom community is what we need and is very much welcomed as we work on developing this resources.

Second, let me use this opportunity to explain how IDC curates data. We make all of our data available in public storage buckets. IDC data is replicated across Google Storage and AWS S3 buckets, so you can choose from where to download it. You do not need login, and there is no egress charge. In order to navigate and search the files IDC is sharing, we provide searchable metadata index available via BigQuery SQL interface. The procedures for downloading data from IDC are described in this documentation page: https://learn.canceridc.dev/data/downloading-data.

More recently we have been working on idc-index python package to further simplify the process of searching and downloading IDC data: https://github.com/ImagingDataCommons/idc-index. We also have a 3D Slicer extension that provides interactive interface for accessing IDC data from the desktop: https://github.com/ImagingDataCommons/SlicerIDCBrowser (I have yet to update our documentation pages with these new tools!)

In addition to having our data in public storage buckets, we also ingest it into a DICOM store provisioned via Google Healthcare API. That DICOM store is behind the proxy mentioned earlier in the thread, and the primary purpose of that DICOM store is to support visualization of IDC data using viewers integrated with IDC Portal - OHIF (for radiology) and Slim (for slide microscopy).

There are two main reasons why the DICOM store is not accessed directly and is behind the proxy. First, due to limitations of Google Healthcare API, it is impossible to have non-authenticated access to that store, and we want to allow IDC users to view images without login. Proxy routes the data without requiring user login. Second, unlike free egress from the storage buckets, egress of data out of the cloud via DICOMweb interface needs to be paid from IDC budget and is not free. To control the costs we need to limit access to IDC data via DICOMweb. Proxy implements daily IP-based egress quotas.

It's a lot of content, but I wanted to try to give you that as a background before the following.

Catch and fix Google Healthcare API errors #149

Since you were accessing DICOM stores via the proxy, you should not extrapolate and assume that the limitations you are observing are germane to the Google Healthcare API and not the proxy. I encourage you to experiment with direct access to Google DICOM stores by setting up your own - it is very easy: https://cloud.google.com/healthcare-api/docs/how-tos/dicom - and I am happy to help you set it up.

access to IDC data via DICOMweb

We really want you to use IDC data! But unfortunately, due to the reasons above, IDC is currently not designed to enable DICOMweb access to its content. Download from storage buckets is the intended pathway for data access. I understand this is rather suboptimal for digital pathology workflows. We are looking for ways to enable unrestricted and unlimited access via DICOMweb, and there is some hope we will be able to do it, but it is difficult at this point to estimate when this would be completed.

Finally, I encourage anyone using IDC to reach out to us via IDC forum. We came across this discussion by a fortunate accident, but we are here to help and are very interested in user feedback. It is very important for me to know that there is interest in the community in DICOMweb access to IDC data.

@psavery I would be very interested to have a meeting with you to discuss the related topics! We have been working on several joint projects with Kitware with @aylward @jcfr @thewtex among others, and I would love to learn more about your use cases. Please reach out via andrey dot fedorov at gmail to coordinate the time!

Sorry for the long post, but I hope this helps!

@psavery
Copy link
Contributor Author

psavery commented Feb 16, 2024

Hi @fedorov,

Nice to meet you!

This work was done in support of large_image, which includes a DICOMweb viewer.

I will defer this discussion to the project lead, @manthey.

Thank you!
Patrick

@psavery
Copy link
Contributor Author

psavery commented Feb 27, 2024

By the way, for the record, I do believe #149 was fixing an issue specifically for Google Healthcare API (not something specific to the IDC proxy), because I saw the same issue mentioned in a few places on their GitHub repositories (one of which was 4 years ago here).

@fedorov
Copy link

fedorov commented Mar 6, 2024

@psavery to be clear, I just wanted to suggest that if you want to investigate a suspected bug in Google Healthcare, it is advisable to do this by directly interacting with a GHC DICOM store, without having proxy in the middle.

Also, we discussed this with David Clunie @dclunie and here is his perspective on the actual issue. Would be good if you could comment on item 3!

  1. AvailableTransferSyntaxUID is an optional parameter that indicates what the server might be able to supply - there should be no expectation that it is supported (since it is relatively new) and no dependency on its value(s)
  2. TransferSyntaxUID is not an appropriate surrogate since it is part of the PS3.10 metainformation about a particular returned dataset, and not necessarily reflective what the server has or might be able to transform it into. It might ot might not be returned, and might or might not be what the caller wants to know.
  3. Why is wsidicom asking for this and what behavior depends on it?

Finally, in part in response to your use case, we amended IDC proxy policy to now allow egress without restricting to IDC viewer only. The per-IP daily quota still applies. Please see the updated proxy policy here: https://learn.canceridc.dev/portal/proxy-policy. I hope this helps you and other users interested in using DICOMweb for accessing IDC data!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants