New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48231][BUILD] Remove unused CodeHaus Jackson dependencies #46521
base: master
Are you sure you want to change the base?
Conversation
cc @dongjoon-hyun and @wangyum |
<groupId>org.codehaus.jackson</groupId> | ||
<artifactId>jackson-core-asl</artifactId> | ||
<version>${codehaus.jackson.version}</version> | ||
<scope>${hive.jackson.scope}</scope> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also remove <hive.jackson.scope>compile</hive.jackson.scope>
?
Line 270 in 44f00cc
<hive.jackson.scope>compile</hive.jackson.scope> |
Line 269 in 2df494f
<hive.jackson.scope>provided</hive.jackson.scope> |
https://github.com/apache/spark/blob/master/assembly/pom.xml#L272-L277
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we identify some issues on hive 2.3.10 before 4.0.0 release, we may need to revert this patch and fallback to SPARK-47119 approach to mitigate CodeHaus Jackson dependencies vulnerabilities, see comemnts at
#45201 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs)
I think the case is already covered by CI. When IsolatedClassLoader is enabled, the
|
Ya, I know that part, but do we have an end-to-end Hive UDF registration and invocation test case? |
@dongjoon-hyun AFAIK, the "Hive UDF execution" always uses built-in Hive jars without IsolatedClassLoader. While "Hive UDF registration" will happen during |
It sounds like that we could have a corner case. That's the reason why we need an actual test case to cover it, isn't it? |
For this one PR, I believe we need a verification for different HMS versions to make it sure. |
Hmm, let me clear my view. In short, I think the current CI is sufficient. Spark uses Hive in two cases:
For case 1, the CI already covers that(any older HMS client initialization triggers built-in UDF registration). For case 2, there is no chance to invoke CodeHaus Jackson classes since Hive 2.3.10 totally removed it in the codebase. |
also cc @wangyum @yaooqinn @AngersZhuuuu @cloud-fan |
that's a valid concern, since Spark CI only covers embedded HMS client case, let me test it with the real setup. |
Thank you. Please attach the test results to the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please hold on all Hive related dependency change until we recover Maven CIs.
What changes were proposed in this pull request?
CodeHaus Jackson dependencies were pulled from Hive, while in apache/hive#4564 (Hive 2.3.10), it migrated to Jackson 2.x, so we can remove them from Spark now.
Why are the changes needed?
Remove unused and vulnerable dependencies.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Pass GA.
Was this patch authored or co-authored using generative AI tooling?
No.