A Practical Comparison of Local Graph Clustering Algorithms

  IJETT-book-cover  International Journal of Engineering Trends and Technology (IJETT)          
  
© 2019 by IJETT Journal
Volume-67 Issue-7
Year of Publication : 2019
Authors : Rashed Khalil Salem,Wafaa Tawfik Abdel Moneim,Mohamed Monir Hassan
DOI :  10.14445/22315381/IJETT-V67I7P213

Citation 

MLA Style: Rashed Khalil Salem,Wafaa Tawfik Abdel Moneim,Mohamed Monir Hassan"A Practical Comparison of Local Graph Clustering Algorithms" International Journal of Engineering Trends and Technology 67.7 (2019): 64-69.

APA Style:Rashed Khalil Salem,Wafaa Tawfik Abdel Moneim,Mohamed Monir Hassan(2019). "A Practical Comparison of Local Graph Clustering Algorithms"International Journal of Engineering Trends and Technology, 67(7), 64-69.

Abstract
Nowadays a large number of applications of graph clustering are available, with expanding the span of the graph the conventional methods of clustering is not appropriate to manipulate these graph because it is costly for computation. Local graph clustering algorithms solve this problem by working on a given vertex as input seed set without looking at the whole graph to find a good cluster. The conventional algorithms are slower than the local clustering algorithms. In this paper, we show a comparison between two of local graph clustering algorithms are HK-relax and SimpleLocal based on conductance and runtime. We display experiments on large-scale graphs and showing that SimpleLocal finds a good cluster with a small conductance that HK-relax but this take more runtime. We also show the seed set size effect on two algorithms as input parameter and find that large size of the seed set gives a good conductance than a small seed set size. In addition to display locality parameter influence on SimpleLocal as input, from the outcomes, we recognize that with decreasing the value of locality ? there is a good conductance of graph clustering.

Reference
[1] W. Fan, and A. Bifet, “Mining Big Data: Current Status, and Forecast to the Future,” SIGKDD Explorations Newsletter, vol. 14(2), pp. 1–5, 2013.
[2] U. Kang and C. Faloutsos, “Big graph mining: Algorithms and discoveries,” SIGKDD Explorations, vol. 14 (2), 2012.
[3] S. Aridhiand E. M. Nguifo, “Big graph mining: Frameworks and techniques,” Big Data Research, Vol. 6, pp. 1-10, 2016.
[4] S. E. Schaeffer, “Graph clustering,” Computer science review, vol. 1(1), pp. 27-64, 2007.
[5] C. C. Aggarwal, and H. Wang, “A survey of clustering algorithms for graph data,” Managing and mining graph data, vol. 40, 275-301, 2010.
[6] J. Ni, H. Fei, W. Fan, and X. Zhang, “Cross-network clustering and cluster ranking for medical diagnosis,” In: ICDE, 2017.
[7] J. Ni, M. Koyuturk, H. Tong, J. Haines, X. Rong, and X. Zhang, “Disease gene prioritization by integrating tissue-specific molecular networks using a robust multinetwork model,” BMC Bioinform, vol. 17(1), 453, 2016.
[8] D. Zhou, and C.J.C. Burges, “Spectral clustering and transductive learning with multiple views,” In: ICML,2007.
[9] A. Kumar, P. Rai, and H. Daume, “Co-regularized multi-view spectral clustering,” In: Advances in neural information processing systems, 2011.
[10] A. Kumar, and H. Daume, “A co-training approach for multi-view spectral clustering,” In: ICML. 2011.
[11] W. Cheng, X. Zhang, Z. Guo, W. Yubao, P.F. Sullivan, and W. Wang, “Flexible and robust co-regularized multi-domain graph clustering,” In: KDD, 2013.
[12] J. Ni, H. Tong, W. Fan, and X. Zhang, “Flexible and robust multi-network clustering,” In: KDD, 2015.
[13] K. Kloster, and D. F. Gleich, “Heat kernel based community detection,” Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014.
[14] N. Veldt, D. F. Gleich, and M. W. Mahoney, “A simple and strongly-local flow-based method for cut improvement,” InternationalConference on Machine Learning (ICML), 2016.
[15] D. A. Spielman, and S.-H. Teng, “Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems,” Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, ACM, 2004.
[16] R. Andersen, F. Chung, and K. Lang, “Local graph partitioning using pagerank vectors,” Proceedingsof 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS`06), 2006.
[17] D. A. Spielman, and S.-H. Teng. “A local clustering algorithm for massive graphs and its application to nearly-linear time graph partitioning,” Society for Industrial and Applied Mathematics, vol. 42(1), pp. 1–26, 2013.
[18] K. Fountoulakis, X. Cheng, J. Shun, F. R. Khorasani, M. W. Mahoney, “Exploiting optimization for local graph clustering,” arXiv preprint arXiv:1602.01886, 2016.
[19] R. Andersen, and K. J. Lang.“An algorithm for improving graph partitions,” Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, 2008.
[20] R. Andersen, and Y. Peres. “Finding sparse cuts locally using evolving sets,” Proceedings of the forty-first annual ACM symposium on Theory of computing, ACM, 2009.
[21] T. C. Kwok, and L. C. Lau, "Finding small sparse cuts by random walk," Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, Springer,pp.615-626, 2012.
[22] K. George, and V. Kumar, “A fast and high quality multilevel scheme for partitioning irregular graphs,” SIAM Journal on Scientific Computing, vol. 20(1), pp. 359–392, 1998.
[23] I. S. Dhillon, Y. Guan, and B. Kulis, "Kernel k-means: spectral clustering and normalized cuts." In SIGKDD’04, pp. 551–556. ACM, 2004
[24] S. v. Dongen, “Graph clustering by flow simulation,” PhD Thesis, 2000.
[25] F. Chung, “A local graph partitioning algorithm using heat kernel pagerank,” Internet Mathematics vol. 6(3), pp. 315-330, 2009.
[26] L. Orecchia, and Z. A. Zhu, “Flow-based algorithms for local graph clustering,” Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, pp. 1267–1286, 2014.
[27] H. Yin, A. R. Benson, J. Leskovec, and D. F. Gleich “Local Higher-Order Graph Clustering,” Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 555-564, 2017.
[28] Dataset link: https://sparse.tamu.edu/.

Keywords
Data Mining, Graph Mining, Dig Data, Graph Clustering, Local Graph Clustering.