-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#1239] Remote merge on the shuffle server side. #1660
base: master
Are you sure you want to change the base?
Conversation
Does code add here come from Apache Spark? |
No, I developed it myself. |
Thanks for proposing this. I have a question that I remember spark will sort the data using the sortExec in dataframe api. That means the sort will not ocurr on the reader process. @advancedxy may have a good thought about this. |
Detailed design could be found on #1239, the initial motivation is to reduce the disk usage of spill, especially on the K8s. And having this, we also can speed up the whole shuffle. |
Sorry, @zuston I missed your comment.
Yeah, these are two catalog of APIs in the Spark side. For the DataFrame/SQL API, ordering is guaranteed by the For this particular feature, I think MR/Tez would benefit from that substantially(to be frankly, I had developed a similar shuffle system back ~6-7 years ago). The spark integration is less exciting though. |
But if we could introduce the extra plan to optimize the sortMergeJoin from sortExec + exchange to uniffleSortedExchangeExec that will benefit from this feature. Is this possible? |
emmm, maybe in your own forked Spark. For open source Spark, it might be possible to inject a strategy rule via SparkSQLExtension. |
Yes, this is acceptable if having performance gain. And I think the extra strategy rule also could be delegated by uniffle if this is valuable. WDYT? |
If it's possible by simply inject a strategy rule, then I'm fine with that. However, if it requires internal changes to Spark, then I don't think it should be hosted/delegated in the Uniffle repo. |
@advancedxy @zuston |
Thanks for your work. @zhengchenyu After reviewing the design doc and concrete code briefly, I have some question about this feature. Spark client
Shuffle-Server
|
The problem is that there's no sort/merge/combine in the ShuffleDependency for SQL/DataFrame workload, which should already dominate the analytics workloads. Hence, the remote merge will only benefit normal raw RDD workloads, which in my opinion is less exciting. |
@advancedxy
For Hive on Tez/MR, it make sense. We know hive also doesn't use the combine features of MR or TEZ. But why make sense? |
Spark client Shuffle-Server |
Firstly I think this is meaningful for tez/mr. For spark, the sorted shuffle is still meaningful, like the facebook did similar things in their cosco (facebook internal remote shuffle service), @jerqi could add some contexts for that.
I'm not sure the current design is available when on producation with too much random io. I prefer using the most general and stable way to optimize this if we have. |
draft pr for #1239