Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement FHIR Bulk Export client. #1781

Open
wants to merge 88 commits into
base: main
Choose a base branch
from
Open

Implement FHIR Bulk Export client. #1781

wants to merge 88 commits into from

Conversation

piotrszul
Copy link
Collaborator

Resolves #1776

Adding a basic progress tracking callback mechanism.
Adding support for handling of Too Many Requests (429) status.
…also support a timestamp (which is what we are getting from the test server).
Copy link

codecov bot commented Feb 19, 2024

Codecov Report

Attention: Patch coverage is 79.04564% with 202 lines in your changes are missing coverage. Please review.

Project coverage is 83.94%. Comparing base (76e21ab) to head (f7fd125).
Report is 13 commits behind head on main.

❗ Current head f7fd125 differs from pull request most recent head e626f38. Consider uploading reports for the commit e626f38 to get more accurate results

Files Patch % Lines
...siro/pathling/library/fs/HdfsFileStoreFactory.java 0.00% 28 Missing ⚠️
...ava/au/csiro/pathling/export/BulkExportClient.java 77.00% 20 Missing and 3 partials ⚠️
...u/csiro/pathling/export/ws/BulkExportTemplate.java 75.58% 14 Missing and 7 partials ⚠️
.../pathling/export/download/UrlDownloadTemplate.java 80.70% 7 Missing and 4 partials ⚠️
...iro/pathling/auth/SMARTTokenCredentialFactory.java 74.35% 8 Missing and 2 partials ⚠️
...ava/au/csiro/pathling/library/PathlingContext.java 0.00% 10 Missing ⚠️
...va/au/csiro/pathling/config/AuthConfiguration.java 50.00% 7 Missing and 1 partial ⚠️
...au/csiro/pathling/export/ws/BulkExportRequest.java 87.50% 6 Missing and 2 partials ⚠️
...siro/pathling/auth/AsymmetricClientAuthMethod.java 76.00% 5 Missing and 1 partial ⚠️
.../au/csiro/pathling/export/BulkExportException.java 81.81% 6 Missing ⚠️
... and 32 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1781      +/-   ##
============================================
- Coverage     84.67%   83.94%   -0.74%     
  Complexity      139      139              
============================================
  Files           341      384      +43     
  Lines          7868     8818     +950     
  Branches        528      592      +64     
============================================
+ Hits           6662     7402     +740     
- Misses          912     1067     +155     
- Partials        294      349      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

gitguardian bot commented Mar 28, 2024

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
Once a secret has been leaked into a git repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

Our GitHub checks need improvements? Share your feedbacks!

@johngrimes johngrimes added the new feature New feature or request label Apr 8, 2024
Copy link
Member

@johngrimes johngrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all of your work on this @piotrszul!

I've gone through it and come up with some feedback for you to take a look at.

<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove this dependency as the first step to separating this module from the main Pathling code base.

* Constants for authentication.
*/
@UtilityClass
public class AuthConst {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps these could be moved into SmartClientAuthMethodBase?

Or into the specific subclass where they are more specific, e.g. CLIENT_ASSERTION_TYPE_JWT_BEARER.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reafactored.

* Authentication method for one of the FHIR SMART client authentication profiles.
*/

public interface ClientAuthMethod {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this level of interface? Aren't all the authentication methods we support SMART client authentication methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored.

return jwkAlgorithm
.replace("RS", "RSA")
.replace("ES", "ECDSA")
.replace("HC", "HMAC");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a undocumented supported algorithm?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should actually be "HS" not "HC" as if HS256.
it's one of the symmetric signing algorithms supported by auth0.
Added for completness as it's supported by 'Algorithm' class but we do not use that since we only expose asymmetric algorithms.

auth_scope="system/*.read",
),
).export()
print(f"TransactionTime: {as_fhir_instant(result.transaction_time)}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the following error when running this script. This is probably because I didn't set the client secret environment variable, but perhaps it should handle this more gracefully.

:: loading settings :: url = jar:file:/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/pyspark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /Users/gri306/.ivy2/cache
The jars for the packages stored in: /Users/gri306/.ivy2/jars
au.csiro.pathling#library-runtime added as a dependency
io.delta#delta-core_2.12 added as a dependency
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-56feae64-188a-4257-9fe1-6edb5a66c4cd;1.0
        confs: [default]
        found au.csiro.pathling#library-runtime;6.6.0-SNAPSHOT in local-m2-cache
        found io.delta#delta-core_2.12;2.4.0 in local-m2-cache
        found io.delta#delta-storage;2.4.0 in local-m2-cache
        found org.antlr#antlr4-runtime;4.9.3 in local-m2-cache
        found org.apache.hadoop#hadoop-aws;3.3.4 in local-m2-cache
        found com.amazonaws#aws-java-sdk-bundle;1.12.262 in local-m2-cache
        found org.wildfly.openssl#wildfly-openssl;1.0.7.Final in local-m2-cache
:: resolution report :: resolve 172ms :: artifacts dl 7ms
        :: modules in use:
        au.csiro.pathling#library-runtime;6.6.0-SNAPSHOT from local-m2-cache in [default]
        com.amazonaws#aws-java-sdk-bundle;1.12.262 from local-m2-cache in [default]
        io.delta#delta-core_2.12;2.4.0 from local-m2-cache in [default]
        io.delta#delta-storage;2.4.0 from local-m2-cache in [default]
        org.antlr#antlr4-runtime;4.9.3 from local-m2-cache in [default]
        org.apache.hadoop#hadoop-aws;3.3.4 from local-m2-cache in [default]
        org.wildfly.openssl#wildfly-openssl;1.0.7.Final from local-m2-cache in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   7   |   0   |   0   |   0   ||   7   |   0   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-56feae64-188a-4257-9fe1-6edb5a66c4cd
        confs: [default]
        0 artifacts copied, 7 already retrieved (0kB/8ms)
24/04/11 11:15:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
24/04/11 11:15:15 DEBUG PathlingContext: Setting log level for 'au.csiro.pathling' to DEBUG
24/04/11 11:15:16 WARN SimpleFunctionRegistry: The function date_diff replaced a previously registered function.
Exporting from: https://aehrc-cdr.cc/fhir_r4 to /Users/gri306/Code/pathling/bulkclient/target/export-1712798116
24/04/11 11:15:16 DEBUG BulkExportClient: Creating FileStore of: au.csiro.pathling.library.fs.HdfsFileStoreFactory@36abae5e for outputDir: /Users/gri306/Code/pathling/bulkclient/target/export-1712798116
24/04/11 11:15:16 DEBUG BulkExportClient: Creating HttpClient with configuration: HttpClientConfiguration(socketTimeout=60000, maxConnectionsTotal=32, maxConnectionsPerRoute=16, retryEnabled=true, retryCount=2)
24/04/11 11:15:16 DEBUG SMARTTokenCredentialFactory: Retrieving SMART configuration for fhirEndpoint: https://aehrc-cdr.cc/fhir_r4
24/04/11 11:15:16 DEBUG SMARTTokenCredentialFactory: SMART configuration retrieved: SMARTDiscoveryResponse(tokenEndpoint=https://aehrc-cdr.cc/smartsec_r4/oauth/token, grantTypesSupported=null, tokenEndpointAuthMethodsSupported=null, tokenEndpointAuthSigningAlgValuesSupported=null, capabilities=[launch-ehr, launch-standalone, client-public, client-confidential-symmetric, client-confidential-asymmetric, sso-openid-connect, context-ehr-patient, context-standalone-patient, permission-offline, permission-patient])
Traceback (most recent call last):
  File "/Users/gri306/Code/pathling/bulkclient/examples/bulk_export_smilecdr.py", line 66, in <module>
    ).export()
      ^^^^^^^^
  File "/Users/gri306/Code/pathling/lib/python/pathling/bulkexport.py", line 157, in export
    return BulkExportResult.from_java(self._jclient.export())
                                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/pathling-dev/lib/python3.12/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o103.export.
: java.lang.NullPointerException
        at java.base/java.util.Objects.requireNonNull(Objects.java:221)
        at au.csiro.pathling.auth.ClientAuthMethod.create(ClientAuthMethod.java:123)
        at au.csiro.pathling.auth.SMARTTokenCredentialFactory.createSMARTAuthMethod(SMARTTokenCredentialFactory.java:123)
        at au.csiro.pathling.auth.SMARTTokenCredentialFactory.createAuthMethod(SMARTTokenCredentialFactory.java:105)
        at au.csiro.pathling.auth.SMARTTokenCredentialFactory.createCredentials(SMARTTokenCredentialFactory.java:91)
        at au.csiro.pathling.export.BulkExportClient.createHttpClient(BulkExportClient.java:365)
        at au.csiro.pathling.export.BulkExportClient.export(BulkExportClient.java:250)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:829)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should be resolved by JSR-380 validation of config.
I was having some issues with the making it work (in this project - some dependency issues).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added JRS-380 validation for the client configuration and it's subconfig beans.

*/
@Nonnull
@Builder.Default
List<Reference> patient = Collections.emptyList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the full list of parameters that I can use? Is it possible to pass parameters that are not in this class, e.g. includeAssociatedData?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the full list.
Actually whatever is defined in BulkExportClient but this is then mapped to the BulkExportRequest.
I think includeAssociatedData is the only one missing from the spec.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this to make the list complete.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented.

executorServiceResource.getExecutorService());

final BulkExportResult result = doExport(fileStore, bulkExportTemplate, downloadTemplate);
log.info("Export successful: {}", result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we issue a Bulk Data Delete Request once the download has successfully completed (if the server supports it)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec say that we MAY issue it to either cancel the request in progress or to cleanup.
So I guess it depends what we want to consider to be required for the first release of the bulk client.
I this this might be something to be added subsequent releases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this to mitigate the problem of exported data accumulating in the source server, which has been a problem that we have encountered with HAPI JPA.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new feature New feature or request
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Bulk FHIR client
2 participants