Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-19161. S3A: option "fs.s3a.performance.flags" to take list of performance flags #6789

Draft
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented May 2, 2024

HADOOP-19161

Initial design

  • no tests or docs
  • served up via StoreContext. Not sure about the merits of that I think it is needed so it gets down to all AbstractStoreOperation instances, but should that be where the decision is made?
  • create performance is wired up.
  • as is path capabilities

For testing we need to make sure ths is unset from all cost tests.

relates to #6543; the logic to set up that operation is here...that PR would
just be the implementation.

Same for a delete optimisation where we'd skip parent dir probe.
rename could do the same for its source dir too.

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@steveloughran
Copy link
Contributor Author

note the commented out bit where we considered adding options like "hive" or "spark".

@HarshitGupta11 and I discussed this; for now lets go with a list of options and "*"

Comment on lines 151 to 154
/* case "hive":
case "impala":
case "spark":
case "distcp":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not let downstreamers decide what flags they want (after extensive testing)? And across different releases, they might need different flags to be turned on (in case of any regression)?

We can just recommend the flags (as already commented out here) but not set the flags for them. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

harshit and I were discussing this. i think it's best to have that option list, as app settings could be too brittle to changes

Comment on lines 76 to 78
public boolean isDelete() {
return delete;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also we want to tackle as separate task (after HADOOP-19072), correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. harshit did an experiment where he turned off all attempts at creating parent dirs after delete. fairly brittle, i think

@steveloughran
Copy link
Contributor Author

I have a better design for this. changign this to draft.

Proposed: we have a Configuration.getEnumOptions(Enum x, boolean failIfUnknown) which returns an EnumSet of all values of the enum class whose valueOf() matches an entry in the CSV list (with some mapping such as case conversion, and map - and . to "_".

this makes it trivial to reuse/process. the implementation would be outside the actual Configuration class to make it easy for AbfsConfiguration to use too

@steveloughran
Copy link
Contributor Author

Reason: Use hadoop-common provided Sets rather than Guava provided Sets
	in file: org/apache/hadoop/util/ConfigurationUtil.java
		org.apache.hadoop.thirdparty.com.google.common.collect.Sets 	(Line: 33, Matched by: org.apache.hadoop.thirdparty.com.google.common.collect.Sets)

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 48s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 53s Maven dependency ordering for branch
+1 💚 mvninstall 37m 7s trunk passed
+1 💚 compile 19m 10s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 17m 13s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 4m 45s trunk passed
+1 💚 mvnsite 2m 30s trunk passed
+1 💚 javadoc 1m 50s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 50s trunk passed
+1 💚 shadedclient 40m 12s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 1m 27s the patch passed
+1 💚 compile 18m 26s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 18m 26s the patch passed
+1 💚 compile 17m 49s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 17m 49s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 36s /results-checkstyle-root.txt root: The patch generated 8 new + 114 unchanged - 0 fixed = 122 total (was 114)
+1 💚 mvnsite 2m 30s the patch passed
-1 ❌ javadoc 1m 8s /results-javadoc-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1.txt hadoop-common-project_hadoop-common-jdkUbuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 generated 2 new + 0 unchanged - 0 fixed = 2 total (was 0)
+1 💚 javadoc 1m 34s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 4m 12s the patch passed
+1 💚 shadedclient 40m 36s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 21m 13s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
-1 ❌ unit 3m 13s /patch-unit-hadoop-tools_hadoop-aws.txt hadoop-aws in the patch passed.
+1 💚 asflicense 1m 0s The patch does not generate ASF License warnings.
270m 14s
Reason Tests
Failed junit tests hadoop.fs.TestFilterFileSystem
hadoop.fs.TestHarFileSystem
hadoop.fs.s3a.commit.staging.TestStagingCommitter
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6789/5/artifact/out/Dockerfile
GITHUB PR #6789
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 11a43faccc32 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / a564aa4
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6789/5/testReport/
Max. process+thread count 3136 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6789/5/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
@apache apache deleted a comment from hadoop-yetus May 29, 2024
…performance flags

* A new FlagSet class in hadoop common enables this
* and Configuration.getEnumSet() supports getting a set of enum values.
* served up via StoreContext. Not sure about the merits of that
  I think it is needed so it gets down to all AbstractStoreOperation instances,
* create performance is wired up.
* tests which configure fs.s3a.create.performance clear fs.s3a.performance.flags
  in test setup.

Change-Id: I52e48d19c624e7c18f22b3130943ffe72fac501f
@steveloughran steveloughran force-pushed the s3/HADOOP-19161-performance-flags branch from a564aa4 to 82974c4 Compare May 29, 2024 12:51
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 37s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 10 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 41s Maven dependency ordering for branch
+1 💚 mvninstall 32m 53s trunk passed
+1 💚 compile 17m 53s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 compile 16m 24s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 checkstyle 5m 3s trunk passed
+1 💚 mvnsite 2m 33s trunk passed
+1 💚 javadoc 1m 48s trunk passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 3m 56s trunk passed
+1 💚 shadedclient 36m 53s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 33s Maven dependency ordering for patch
+1 💚 mvninstall 1m 31s the patch passed
+1 💚 compile 17m 27s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javac 17m 27s the patch passed
+1 💚 compile 16m 23s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 javac 16m 23s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 4m 31s /results-checkstyle-root.txt root: The patch generated 1 new + 114 unchanged - 0 fixed = 115 total (was 114)
+1 💚 mvnsite 2m 33s the patch passed
+1 💚 javadoc 1m 41s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1
+1 💚 javadoc 1m 47s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
+1 💚 spotbugs 4m 49s the patch passed
+1 💚 shadedclient 36m 52s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 21m 35s /patch-unit-hadoop-common-project_hadoop-common.txt hadoop-common in the patch passed.
+1 💚 unit 2m 55s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 56s The patch does not generate ASF License warnings.
255m 9s
Reason Tests
Failed junit tests hadoop.fs.TestHarFileSystem
hadoop.fs.TestFilterFileSystem
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6789/6/artifact/out/Dockerfile
GITHUB PR #6789
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux a3d5f3ebdd16 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 82974c4
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu220.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~20.04-b06
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6789/6/testReport/
Max. process+thread count 3151 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6789/6/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants