-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP Errors for Imaging Data Commons #141
Comments
Hi @psavery, |
I see that... hmm. My reading from their forums seems to indicate they are using a "Google Cloud Healthcare DICOMWeb service". They put it behind a proxy to prevent full downloads. See here. |
After I made these changes, I could view a dataset using our viewer. For example: ```python from wsidicom import WsiDicom, WsiDicomWebClient url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb' study_uid = '2.25.25644321580420796312527343668921514374' series_uid = '1.3.6.1.4.1.5962.99.1.3205815762.381594633.1639588388306.2.0' client = WsiDicomWebClient.create_client(url) slide = WsiDicom.open_web(client, study_uid, series_uid) ``` I had to comment out the use of `AvailableTransferSyntaxUID` and `SOPClassUID` because they were producing these errors, respectively: ```bash "invalid QIDO-RS query: unknown/unsupported QIDO attribute: AvailableTransferSyntaxUID" "generic::invalid_argument: SOPClassUID is not a supported instance or series level attribute" ``` I had to comment out the annotation parts because I was getting this error: ```python Traceback (most recent call last): File "/opt/large_image/sources/dicom/large_image_source_dicom/__init__.py", line 152, in __init__ self._dicom = self._open_wsi_dicom(self._largeImagePath) File "/opt/large_image/sources/dicom/large_image_source_dicom/__init__.py", line 183, in _open_wsi_dicom return self._open_wsi_dicomweb(path) File "/opt/large_image/sources/dicom/large_image_source_dicom/__init__.py", line 209, in _open_wsi_dicomweb return wsidicom.WsiDicom.open_web(wsidicom_client, study_uid, series_uid) File "/opt/wsidicom/wsidicom/wsidicom.py", line 153, in open_web source = WsiDicomWebSource( File "/opt/wsidicom/wsidicom/web/wsidicom_web_source.py", line 146, in __init__ AnnotationInstance.open_dataset(annotation_instance) File "/opt/wsidicom/wsidicom/graphical_annotations.py", line 1653, in open_dataset if dataset.AnnotationCoordinateType == "2D": File "/opt/venv/lib/python3.9/site-packages/pydicom/dataset.py", line 908, in __getattr__ return object.__getattribute__(self, name) AttributeError: 'Dataset' object has no attribute 'AnnotationCoordinateType' ``` See imi-bigpicture#141 Signed-off-by: Patrick Avery <[email protected]>
I am able to view an example dataset using our viewer if I make the changes here (although I know those are not changes that would be merged). |
This change adds support for Google Healthcare API DICOMweb servers, such as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons). The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a field, or if `SOPClassUID` is used as a search filter. The `SOPClassUID` should definitely be allowed as an instance-level search filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html). However, this has apparently been a long-standing problem of nearly four years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))), so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data Commons may not update their software anytime soon. It would be highly advantageous to support such a large DICOMweb repository by working around the issue. The fix in this PR is as follows: 1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors. 2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows: a) `AvailableTransferSyntaxUID` is simply removed, if present. b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned). These changes shouldn't have any impact on any situations except where an error occurs from a Google Healthcare API server. And in that case, the function calls are patched and then work properly. The following example works after this fix: ```python from wsidicom import WsiDicom, WsiDicomWebClient url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb' study_uid = '2.25.227261840503961430496812955999336758586' series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0' client = WsiDicomWebClient.create_client(url) slide = WsiDicom.open_web(client, study_uid, series_uid) ``` Fixes: imi-bigpicture#141 Signed-off-by: Patrick Avery <[email protected]>
This change adds support for Google Healthcare API DICOMweb servers, such as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons). The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a field, or if `SOPClassUID` is used as a search filter. The `SOPClassUID` should definitely be allowed as an instance-level search filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html). However, this has apparently been a long-standing problem of nearly four years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))), so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data Commons may not update their software anytime soon. It would be highly advantageous to support such a large DICOMweb repository by working around the issue. The fix in this PR is as follows: 1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors. 2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows: a) `AvailableTransferSyntaxUID` is simply removed, if present. b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned). These changes shouldn't have any impact on any situations except where an error occurs from a Google Healthcare API server. And in that case, the function calls are patched and then work properly. The following example works after this fix: ```python from wsidicom import WsiDicom, WsiDicomWebClient url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb' study_uid = '2.25.227261840503961430496812955999336758586' series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0' client = WsiDicomWebClient.create_client(url) slide = WsiDicom.open_web(client, study_uid, series_uid) ``` Fixes: imi-bigpicture#141 Signed-off-by: Patrick Avery <[email protected]>
This change adds support for Google Healthcare API DICOMweb servers, such as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons). The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a field, or if `SOPClassUID` is used as a search filter. The `SOPClassUID` should definitely be allowed as an instance-level search filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html). However, this has apparently been a long-standing problem of nearly four years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))), so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data Commons may not update their software anytime soon. It would be highly advantageous to support such a large DICOMweb repository by working around the issue. The fix in this PR is as follows: 1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors. 2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows: a) `AvailableTransferSyntaxUID` is simply removed, if present. b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned). These changes shouldn't have any impact on any situations except where an error occurs from a Google Healthcare API server. And in that case, the function calls are patched and then work properly. The following example works after this fix: ```python from wsidicom import WsiDicom, WsiDicomWebClient url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb' study_uid = '2.25.227261840503961430496812955999336758586' series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0' client = WsiDicomWebClient.create_client(url) slide = WsiDicom.open_web(client, study_uid, series_uid) ``` Fixes: imi-bigpicture#141 Signed-off-by: Patrick Avery <[email protected]>
This change adds support for Google Healthcare API DICOMweb servers, such as the NCI's [Imaging Data Commons](https://datacommons.cancer.gov/repository/imaging-data-commons). The problem: Google Healthcare API raises an error if `AvailableTransferSyntaxUID` is a field, or if `SOPClassUID` is used as a search filter. The `SOPClassUID` should definitely be allowed as an instance-level search filter, as documented in [Table 10.6.1-5. Required Matching Attributes](https://dicom.nema.org/medical/dicom/current/output/chtml/part18/sect_10.6.html). However, this has apparently been a long-standing problem of nearly four years (see [here](GoogleCloudPlatform/healthcare-dicom-dicomweb-adapter#30 (comment))), so it may not be fixed anytime soon. And even if it is fixed, the Imaging Data Commons may not update their software anytime soon. It would be highly advantageous to support such a large DICOMweb repository by working around the issue. The fix in this PR is as follows: 1. The two `search_for_instances()` calls are still performed identically as before, as long as there are no HTTP errors. 2. If there is an HTTP error with a 400 status_code, and a message is present matching the errors from Google Healthcare API, then the `search_for_instances()` arguments are patched to work for Google Healthcare API, as follows: a) `AvailableTransferSyntaxUID` is simply removed, if present. b) `SOPClassUID` is manually filtered, if present (meaning it is not supplied in the `search_filters`, but only instances with a matching `SOPClassUID` are returned). These changes shouldn't have any impact on any situations except where an error occurs from a Google Healthcare API server. And in that case, the function calls are patched and then work properly. The following example works after this fix: ```python from wsidicom import WsiDicom, WsiDicomWebClient url = 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb' study_uid = '2.25.227261840503961430496812955999336758586' series_uid = '1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0' client = WsiDicomWebClient.create_client(url) slide = WsiDicom.open_web(client, study_uid, series_uid) ``` Fixes: #141 Signed-off-by: Patrick Avery <[email protected]>
@psavery I am Andrey Fedorov, one of the leads of Imaging Data Commons. First of all, thank you for using IDC, it is great to hear that you found it useful! Feedback from users like you and others in the Second, let me use this opportunity to explain how IDC curates data. We make all of our data available in public storage buckets. IDC data is replicated across Google Storage and AWS S3 buckets, so you can choose from where to download it. You do not need login, and there is no egress charge. In order to navigate and search the files IDC is sharing, we provide searchable metadata index available via BigQuery SQL interface. The procedures for downloading data from IDC are described in this documentation page: https://learn.canceridc.dev/data/downloading-data. More recently we have been working on In addition to having our data in public storage buckets, we also ingest it into a DICOM store provisioned via Google Healthcare API. That DICOM store is behind the proxy mentioned earlier in the thread, and the primary purpose of that DICOM store is to support visualization of IDC data using viewers integrated with IDC Portal - OHIF (for radiology) and Slim (for slide microscopy). There are two main reasons why the DICOM store is not accessed directly and is behind the proxy. First, due to limitations of Google Healthcare API, it is impossible to have non-authenticated access to that store, and we want to allow IDC users to view images without login. Proxy routes the data without requiring user login. Second, unlike free egress from the storage buckets, egress of data out of the cloud via DICOMweb interface needs to be paid from IDC budget and is not free. To control the costs we need to limit access to IDC data via DICOMweb. Proxy implements daily IP-based egress quotas. It's a lot of content, but I wanted to try to give you that as a background before the following. Since you were accessing DICOM stores via the proxy, you should not extrapolate and assume that the limitations you are observing are germane to the Google Healthcare API and not the proxy. I encourage you to experiment with direct access to Google DICOM stores by setting up your own - it is very easy: https://cloud.google.com/healthcare-api/docs/how-tos/dicom - and I am happy to help you set it up.
We really want you to use IDC data! But unfortunately, due to the reasons above, IDC is currently not designed to enable DICOMweb access to its content. Download from storage buckets is the intended pathway for data access. I understand this is rather suboptimal for digital pathology workflows. We are looking for ways to enable unrestricted and unlimited access via DICOMweb, and there is some hope we will be able to do it, but it is difficult at this point to estimate when this would be completed. Finally, I encourage anyone using IDC to reach out to us via IDC forum. We came across this discussion by a fortunate accident, but we are here to help and are very interested in user feedback. It is very important for me to know that there is interest in the community in DICOMweb access to IDC data. @psavery I would be very interested to have a meeting with you to discuss the related topics! We have been working on several joint projects with Kitware with @aylward @jcfr @thewtex among others, and I would love to learn more about your use cases. Please reach out via andrey dot fedorov at gmail to coordinate the time! Sorry for the long post, but I hope this helps! |
Hi @fedorov, Nice to meet you! This work was done in support of large_image, which includes a DICOMweb viewer. I will defer this discussion to the project lead, @manthey. Thank you! |
@psavery to be clear, I just wanted to suggest that if you want to investigate a suspected bug in Google Healthcare, it is advisable to do this by directly interacting with a GHC DICOM store, without having proxy in the middle. Also, we discussed this with David Clunie @dclunie and here is his perspective on the actual issue. Would be good if you could comment on item 3!
Finally, in part in response to your use case, we amended IDC proxy policy to now allow egress without restricting to IDC viewer only. The per-IP daily quota still applies. Please see the updated proxy policy here: https://learn.canceridc.dev/portal/proxy-policy. I hope this helps you and other users interested in using DICOMweb for accessing IDC data! |
The NCI's Imaging Data Commons is a big repository (>38k studies) for cancer research.
Their studies are available via DICOMweb. See here for an example of viewing one of them.
I tried to access that same example, but I get a couple of HTTP errors. Try out the following code:
It produces the following exception:
I played around with that url with
curl
and found a couple of errors. Accessing that full URL via this command:curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0&includefield=AvailableTransferSyntaxUID'
Produces:
So it is raising an exception because we asked for the
AvailableTransferSyntaxUID
. However, if I remove that part of the url:curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?00080016=1.2.840.10008.5.1.4.1.1.77.1.6&0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'
It raises another error:
So it is also complaining that we are asking for a SOPClassUID of
WSI_SOP_CLASS_UID
in the search filters.If I remove that part of the url also:
curl 'https://proxy.imaging.datacommons.cancer.gov/current/viewer-only-no-downloads-see-tinyurl-dot-com-slash-3j3d9jyp/dicomWeb/studies/2.25.227261840503961430496812955999336758586/instances?0020000E=1.3.6.1.4.1.5962.99.1.1334438926.1589741711.1637717011470.2.0'
It works fine.
It's kind of annoying that an error is being raised for
AvailableTransferSyntaxUID
. It would be nice if it just returned an empty field if it was not available.However, it would be really nice if we could support interacting with this DICOMweb server.
Let me know what your thoughts are, @erikogabrielsson.
The text was updated successfully, but these errors were encountered: