Skip to content

Latest commit

 

History

History
45 lines (37 loc) · 1.94 KB

triangle_count_undirected_en.md

File metadata and controls

45 lines (37 loc) · 1.94 KB

TriangleCountUndirected

1. Algorithm Introduction

TriangleCounting is used to determine the number of triangles passing through each node in a undirected graph. Normally the Local Cluster Cofficient (LCC) is also calculated during the whole process. The returned result contains two or three columns, which represents the vertexID, num of triangles and the LCC value (optional).

2. Parameters

IO Params

  • input:hdfs path for a undirected and unweighted graph, each row represents an edge in the form of srcId | dstId
  • output: hdfs path for output, each row represents the vertex and the corresponding number of triangles and the lcc value(optional) in the form of nodeId | numTriangles | lccValue(optional)
  • sep: the separation in input file to separate the srcId and dstId, could be tab, space or comma

Algo Params

  • partitionNum:num of RDD partitions
  • psPartitionNum:num of data partitions on ps
  • batchSize: the mini batchSize of vertices when push neighborTable to ps
  • pullBatchSize: the mini batchSize of vertices when calculating triangles for each vertex
  • computeLCC: whether to compute the lcc value at the same time
  • storageLevel:RDD persist level,DISK_ONLY/MEMORY_ONLY/MEMORY_AND_DISK

Running

input=hdfs://my-hdfs/data
output=hdfs://my-hdfs/output

source ./spark-on-angel-env.sh
$SPARK_HOME/bin/spark-submit \
  --master yarn-cluster\
  --conf spark.ps.instances=1 \
  --conf spark.ps.cores=1 \
  --conf spark.ps.jars=$SONA_ANGEL_JARS \
  --conf spark.ps.memory=10g \
  --name "hindex angel" \
  --jars $SONA_SPARK_JARS  \
  --driver-memory 5g \
  --num-executors 1 \
  --executor-cores 4 \
  --executor-memory 10g \
  --class org.apache.spark.angel.examples.graph.TriangleCountUndirectedExample \
  ../lib/spark-on-angel-examples-3.2.0.jar
  input:$input output:$output sep:tab storageLevel:MEMORY_ONLY useBalancePartition:true \
  partitionNum:4 psPartitionNum:1 batchSize:3000 pullBatchSize:1000 computeLCC:false