Skip to content

Latest commit

 

History

History
176 lines (163 loc) · 4.24 KB

benchmark.md

File metadata and controls

176 lines (163 loc) · 4.24 KB

Environment

Software: Uniffle 0.2.0 Hadoop 2.8.5 Spark 2.4.6

Hardware: Machine 176 cores, 265G memory, 4T * 12 HDD, network bandwidth 10GB/s

Hadoop Yarn Cluster: 1 * ResourceManager + 6 * NodeManager, every machine 4T * 10 HDD

Uniffle Cluster: 1 * Coordinator + 6 * Shuffle Server, every machine 4T * 10 HDD

Configuration

Spark's configuration

spark.executor.instances 100
spark.executor.cores 4
spark.executor.memory 9g
spark.executor.memoryOverhead 1024
spark.shuffle.manager org.apache.spark.shuffle.RssShuffleManager
spark.rss.storage.type MEMORY_LOCALFILE

Shuffle Server's configuration

rss.storage.type MEMORY_LOCALFILE
rss.server.buffer.capacity 50g

TPC-DS

We used spark-sql-perf to generate 1TB data.

query name vanilla uniffle
query1 16 18
query10 30 35
query11 86 96
query12 14 17
query13 102 77
query14a 239 254
query14b 226 232
query15 44 48
query16 50 59
query17 83 97
query18 31 35
query19 15 17
query2 21 25
query20 15 16
query21 8 8
query22 21 22
query23a 288 366
query23b 366 422
query24a 181 198
query24b 167 187
query25 93 113
query26 15 15
query27 16 17
query28 38 41
query29 80 102
query3 9 11
query30 21 26
query31 30 40
query32 14 15
query33 26 30
query34 12 16
query35 34 39
query36 15 18
query37 16 20
query38 27 36
query39 15 19
query39a 16 20
query39b 14 19
query4 205 227
query40 38 38
query41 5 6
query42 9 10
query43 13 13
query44 20 22
query45 30 36
query46 16 18
query47 22 25
query48 25 24
query49 58 66
query5 56 59
query50 56 61
query51 23 28
query52 9 10
query53 12 13
query54 52 62
query55 9 10
query56 25 27
query57 20 22
query58 23 26
query59 22 22
query6 33 41
query60 25 28
query61 25 28
query62 10 11
query63 12 12
query64 176 185
query65 32 37
query66 23 24
query67 697 775
query68 17 19
query69 31 34
query7 17 17
query70 24 27
query71 23 24
query72 335 350
query73 12 14
query74 68 99
query75 58 67
query76 21 21
query77 35 37
query78 151 169
query79 16 16
query8 15 20
query80 146 163
query81 18 26
query82 28 31
query83 21 24
query84 16 19
query85 45 49
query86 14 17
query87 29 37
query88 29 29
query89 11 13
query9 37 37
query90 11 11
query91 17 21
query92 12 12
query93 86 86
query94 40 42
query95 94 94
query96 10 10
query97 29 34
query98 17 21
query99 13 12
total 5821 6494

Uniffle is a little 9% slower than vanilla Spark. Because the amount of shuffle is tiny.

Tera Sort

We generate 1TB data, we use the code of the repo

Uniffle performance

Overall Time: Overall Time Write Time: Write Time Read Time: Read Time

vanilla Spark performance

Overall Time: Overall Time Write Time: Write Time Read Time: Read Time Uniffle is 30%+ much faster than vanilla Spark when there is a large shuffle.