Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove tempStorageDirectory and rely on task dir instead #16416

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

adarshsanjeev
Copy link
Contributor

@adarshsanjeev adarshsanjeev commented May 8, 2024

Removes the temporary directory used by durable storage and export, from a user configured value. This PR instead reuses the temporary storage configured for the task.

Pending:

  • This PR adds the bindings to peons to inject the temporary directory location. However, these changes are still pending for indexers.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

}

@VisibleForTesting
protected RetryableS3OutputStream(
S3OutputConfig config,
ServerSideEncryptingAmazonS3 s3,
String s3Key,
boolean chunkValidation
boolean chunkValidation,

Check notice

Code scanning / CodeQL

Useless parameter Note

The parameter 'chunkValidation' is never used.
}

@Override
public StorageConnector get()
{
return new AzureStorageConnector(this, azureStorage);
final File tempDir = injector.getInstance(Key.get(File.class, Names.named(DataSourceTaskIdHolder.TMP_DIR_BINDING)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this change @adarshsanjeev 👍
Although, it might be worth checking if you can directly JacksonInject the File object - one way to try it could be to do :

@JacksonInject
@Named(DataSourceTaskIdHolder.TMP_DIR_BINDING)
File tempDir;

Also, could consider adding the JacksonInject in the constructor itself to keep all the relevant things there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lastly, could be worth having the temp dir instance setup for other processes as well, and also for indexers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the above initially, but injecting the tempDir as a variable fails (maybe since it is a jackson inject, and that binding has not been added?).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it fail for processes which are not peons? if that's the case you can try also marking this Nullable
also, if you have the error available, please feel free to paste it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above code fails for peons with the error
Guice configuration errors: 1) No implementation for java.io.File annotated with @com.google.inject.name.Named(value="druidTempDirectory") was bound. while locating java.io.File annotated with @com.google.inject.name.Named(value="druidTempDirectory") 1 error

@github-actions github-actions bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels May 21, 2024
@adarshsanjeev adarshsanjeev marked this pull request as ready for review May 21, 2024 05:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area - Batch Ingestion Area - Dependencies Area - Documentation Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants