A Survey of Frequent Subtrees and Subgraphs Mining Methods

hamed dinari, Hassan Naderi

Abstract


A graph is a basic data structure which, can be used to model complex structures and the relationships between them, such as XML documents, social networks, communication networks, chemical informatics, biology networks, and structure of web pages. Frequent subgraph pattern mining is one of the most important fields in graph mining. In light of  many applications for it, there are extensive researches in this area, such as analysis and processing of XML documents, documents clustering and classification, images and video indexing, graph indexing for graph querying, routing in computer networks, web links analysis, drugs design, and carcinogenesis. Several frequent pattern mining algorithms have been proposed in recent years and every day a new one is introduced. The fact that these algorithms use various methods on different datasets, patterns mining types, graph and tree representations, it is not easy to study them in terms of features and performance. This paper presents a brief report of an intensive investigation of actual frequent subgraphs and subtrees mining algorithms. The algorithms were also categorized based on different features.


Keywords


Graph Mining, Subgraph, Frequent Pattern, Graph indexing

Full Text:

PDF

References


A.Rajaraman, J.D.Ullman, Mining of Massive Datasets, 2nd ed. 2012.

J.Han, M.Kamber, Data Mining Concepts and Techniques. USA: Diane Cerra, 2006.

Kuramochi, Michihiro, and G.Karypis., " An efficient algorithm for discovering frequent subgraphs," in IEEE Transactions on Knowledge and Data Engineering, 2004 , pp. 1038-1051.

J.Huan, W.Wang, J. Prins, "Efficient Mining of Frequent Subgraph in the presence of isomorphism," in Third IEEE International Conference on Data Minign (ICDM), 2003.

(2013, Dec.) “Trust Network Datasets - TrustLet. [Online]. http://www.trustlet.org

L.YAN, J.WANG, "Extracting regular behaviors from social media networks," in Third International Conference on Multimedia Information Networking and Security, 2011.

Ivancsy,I. Renata, I.Vajk, "Clustering XML documents using frequent subtrees," Advances in Focused Retrieval, vol. 3, pp. 436-445, 2009.

J.Yuan, X.Li, L.Ma, "An Improved XML Document Clustering Using Path Features," in Fifth International Conference on Fuzzy Systems and knowledge Discovery, vol. 2, 2008.

Lee, Wenke, and Salvatore J. Stolfo , "A framework for constructing features and models for intrusion detection systems," in ACM transactions on Information and system security (TiSSEC), 2000, pp. 227-261.

Ko, C, "Logic induction of valid behavior specifications for intrusion detection ," in In IEEE Symposium on Security and Privacy (S&P), 2000, p. 142–155.

Yoshida, K. and Motoda,, " CLIP: Concept learning from inference patterns," in Artificial Intelligence, 1995, p. 63–92.

Wasserman, S., Faust, K., and Iacobucci, D, Social network analysis : Methods and applications. Cambridge university Press, 1994.

Berendt, B., Hotho, A., and Stumme, G., "semantic web mining," in In Conference International Semantic Web (ISWC), 2002, p. 264–278.

S.C.Manekar, M.Narnaware, "Indexing Frequent Subgraphs in Large graph Database using Parallelization," International Journal of Science and Research (IJSR), vol. 2 , no. 5, May 2013.

Peng, Tao, et al. , "A Graph Indexing Approach for Content-Based Recommendation System," in IEEE Second International Conference on Multimedia and Information Technology (MMIT), 2010, pp. 93-97.

S.Sakr, E.Pardede,., "Graph Data Management: Techniques and Applications," in Published in the United States of America by Information Science Reference, 2011.

Y.Xiaogang, T.Ye, P.Tao, C.Canfeng, M.Jian, " Semantic-

Based Graph Index for Mobile Photo Search," in Second International Workshop on Education Technology and Computer Science, 2010, pp. 193-197.

Yildirim, Hilmi, and Mohammed Javeed Zaki. , "Graph indexing for reachability queries," in 26th International Conference on Data Engineering Workshops (ICDEW)IEEE, 2010, pp. 321-324.

R.Ivancsy and I.Vajk, "Frequent Pattern Mining in Web Log Data," in Acta Polytechnica Hungarica, 2006, pp. 77-90.

G.XU, Y.zhang, L.li, Web mining and Social Networking. melbourn: Springer, 2010.

S.Ranu, A.K. Singh, "Indexing and mining topological patterns for drug," in ACM, Data mining and knowlodge discovery, Berlin, Germany, 2010.

(2013, Dec.) Drug Information Portal. [Online]. http://druginfo.nlm.nih.gov

(2013, Dec.) DrugBank. [Online]. http://www.drugbank.ca

Dehaspe,Toivonen, and King, R.D, "Finding frequent substructures in chemical compounds.," in In Proc. of the 4th ACM International Conference on Knowledge Discovery and Data Mining, 1998, pp. 30-36.

Kramer, S., De Raedt, L., and Helma, C. , "Molecular feature mining in HIV data," in In Proc. of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-01), 2001, p. 136–143.

Gonzalez, J., Holder, L.B. and Cook, "Application of graph-based concept learning to the predictive toxicology domain," in In Proc. of the Predictive Toxicology Challenge Workshop, 2001.

H.J.Patel, R.Prajapati, M.Panchal, M.Patel, "A Survey of Graph Pattern Mining Algorithm and Techniques," International Journal of Application or Innovation in Engineering & Management (IJAIEM), vol. 2, no. 1, Jan. 2013.

K.Lakshmi, T. Meyyappan, "FREQUENT SUBGRAPH MINING ALGORITHMS - A SURVEY AND FRAMEWORK FOR CLASSIFICATION," computer science and information technology, p. 189–202, 2012.

D.Kavitha, B.V.Manikyala Rao and V. Kishore Babu, "A Survey on Assorted Approaches to Graph Data Mining," in International Journal of Computer Applications, 2011, pp. 43-46.

C.C.Aggarwal,Wang, Haixun, Managing and Mining Graph Data. Springer, 2010.

B.Wackersreuther, Bianca, et al. , "Frequent subgraph discovery in dynamic networks," in ACM, Proceedings of the Eighth Workshop on Mining and Learning with Graphs, Washington DC USA, 2010, pp. 155-162.

Kuramochi, Michihiro, and G.Karypis, "Grew-a scalable frequent subgraph discovery algorithm," in Fourth IEEE International Conference on Data Mining (ICDM), 2004, pp. 439-442.

Huan, Jun, "Spin: mining maximal frequent subgraphs from graph databases," in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004.

Borgwardt, Karsten M., H-P. Kriegel, and P.Wackersreuther, "Pattern mining in frequent dynamic subgraphs," in Sixth International Conference on Data Mining (ICDM), 2006, pp. 818-822.

Inokuchi, Akihiro, T.Washio, and H.Motoda, "An apriori-based algorithm for mining frequent substructures from graph data," in Principles of Data Mining and Knowledge Discovery, Springer Berlin Heidelberg, 2000, pp. 13-23.

Zou, Zhaonian, et alâ€, "Frequent subgraph pattern mining on uncertain graph data," in Proceedings of the 18th ACM conference on Information and knowledge management, 2009, pp. 583-592.

Ketkar, N.S, Lawrence B.Holder, and D.J.Cook, "Subdue: compression-based frequent pattern discovery in graph data," in ACM, " Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, 2005, pp. 71-76.

A. Inokuchi, T. Washio, and H. Motoda, "Complete mining of frequent patterns from graphs: Mining graph data," in Machine Learning, 2003, pp. 321-354â€.

Kuramochi, Michihiro, and G.Karypis , " "Discovering frequent geometric subgraphs," in Information Systems, 2007, pp. 1101-1120.

Thomas, Lini T, Satyanarayana R. Valluri, and K.Karlapalem, "Margin: Maximal frequent subgraph mining," in IEEE Sixth International Conference on Data Mining (ICDM), 2006, pp. 1097-1101.

Yan, Xifeng, and J.Han, " gspan: Graph-based substructure pattern mining," in Proceedings International Conference on Data Mining.IEEE , 2002, pp. 721-724.

Yan, Xifeng, and Jiawei Han, "CloseGraph: mining closed frequent graph patterns," in Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, 2003, pp. 286-295.

Nijssen, Siegfried, and J.N. Kok., "The gaston tool for frequent subgraph mining," in " Electronic Notes in Theoretical Computer Science , 2005, pp. 77-87â€.

Hsieh, Hsun-Ping, and Cheng-Te Li, "Mining temporal subgraph patterns in heterogeneous information networks," in IEEE Second International Conference on Social Computing (SocialCom), 2010, pp. 282-287.

Wörlein, Marc, et al, "A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston," in Knowledge Discovery in Databases: PKDD , Springer Berlin Heidelberg, 2005, pp. 392-403.

S.J.Suryawanshi,S.M.Kamalapur, "Algorithms for Frequent Subgraph Mining," International Journal of Advanced Research in Computer and Communication Engineering, vol. 2, no. 3, Mar. 2013.

Liu, Yong, Jianzhong Li, and Hong Gao, "JPMiner: mining frequent jump patterns from graph databases," in IEEE, Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 2009, pp. 114-118.

Reinhardt, Steve, and G.Karypisâ€, "A multi-level parallel implementation of a program for finding frequent patterns in a large sparse graph," in IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2007, pp. 1-8.

Schreiber, Falk, and H.Schwobbermeyer, "Frequency concepts and pattern detection for the analysis of motifs in networks," in Transactions on computational systems biology III, Springer Berlin Heidelberg, 2005, pp. 89-104.

Chent, Chen, et al., " gapprox: Mining frequent approximate patterns from a massive network," in Seventh IEEE International Conference on Data Mining (ICDM) , 2007, pp. 445-450.

Ke, Yiping, J.Cheng, and Jeffrey Xu Yu, "Efficient discovery of frequent correlated subgraph pairs," in Ninth IEEE International Conference on Data Mining (ICDM), 2009, pp. 239-248.

Zhang, Shijie, J.Yang, and Shirong Li, " "Ring: An integrated method for frequent representative subgraph mining," in Ninth IEEE International Conference on Data Mining (ICDM), 2009, pp. 1082-1087.

Fromont, Elisa, Céline Robardet, and A.Pradoâ€, "Constraint-based subspace clustering," in International conference on data mining, 2009, pp. 26-37.

Ranu, Sayan, and Ambuj K. Singh, "Graphsig: A scalable approach to mining significant subgraphs in large graph databases," in IEEE 25th International Conference on Data Engineering (ICDE), 2009, pp. 844-855.

R. Vijayalakshmi,R. Nadarajan, J.F.Roddick,M. Thilaga, "FP-GraphMiner { A Fast Frequent Pattern Mining Algorithm for Network Graphs}," Journal of Graph Algorithms and Applications, vol. 15, pp. 753-776, 2011.

Zhu, Feida, et al. , "gPrune: a constraint pushing framework for graph pattern mining," in Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2007, pp. 388-400.

Yan, Xifeng, X. Zhou, and Jiawei Han, "Mining closed relational graphs with connectivity constraints," in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 2005, pp. 324-333.

Wu, Jia, and Ling Chen, "A fast frequent subgraph mining algorithm," in The 9th International Conference for Young Computer Scientists (ICYCSâ€), 2008, pp. 82-87.

Krishna, Varun, N. N. R. R. Suri, G. Athithan, " A comparative survey of algorithms for frequent subgraph discovery," Current Science(Bangalore), pp. 1980-1988, 2011.

K.Lakshmi, T. Meyyappan, "A COMPARATIVE STUDY OF FREQUENT SUBGRAPH MINING ALGORITHMS," International Journal of Information Technology Convergence and Services (IJITCS), vol. 2, no. 2, Apr. 2012.

C.Jiang, F.Coenen, M.Zito, "A Survey of Frequent Subgraph Mining Algorithms," The Knowledge Engineering Review, pp. 1-31, 2004.

M.Gholami, A.Salajegheh, "A Survey on Algorithms of Mining Frequent Subgraphs," International Journal of Engineering Inventions, vol. 1, no. 5, pp. 60-63, Sep. 2012.

V.Singh, D.Garg, "Survey of Finding Frequent Patterns in Graph Mining: Algorithms and Techniques," International Journal of Soft Computing and Engineering (IJSCE), vol. 1, no. 3, Jul. 2011.

Hussein, M.MA, T. H.Soliman, O.H. Karam, "GP-Growth: A New Algorithm for Mining Frequent Embedded Subtrees," 12th IEEE Symposium on Computers and Communications, no. ISCC, 2007.

Tatikonda, Shirish, S.Parthasarathy,T.Kurc, "TRIPS and TIDES: new algorithms for tree mining," in Proceedings of the 15th ACM international conference on Information and knowledge management, 2006.

Tung, Jiun-Hung, "MINT: Mining Frequent Rooted Induced Unordered Tree without Candidate Generation," in , 2006.

Chi, Yun, Y.Yang, and Richard R. Muntz. , "HybridTreeMiner: An efficient algorithm for mining frequent rooted trees and free trees using canonical forms," in Proceedings 16th International Conference on Scientific and Statistical Database Management, 2004.

T.Asai, H.Arimura, T.Uno, S.Nakano and K.Satoh, "Efficient tree mining using reverse search," in , 2008.

S.Hido, and H. Kawano, "AMIOT: Induced Ordered Tree Mining in Tree-structured Databases," in Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), 2005.

H.Tan, T.S. Dillon, F.Hadzic, E.Chang, and L.Feng, "IMB3-Miner: Mining Induced/Embedded Subtrees by Constraining the Level of Embedding," in Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2006, p. 450–461.

M.J.Zaki, " Efficiently mining frequent trees in a forest," in In Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (ACM SIGKDD’02), 2002, pp. 71-80.

C.Wang, M.Hong, J.Pei, H.Zhou, W.Wang, "Efficient pattern-growth methods for frequent tree pattern mining," in Advances in Knowledge Discovery and Data Mining, Springer Berlin Heidelberg, 2004, pp. 441-451.

S.Nijssen and J.N.Kok, "Efficient Discovery of Frequent Unordered Trees," in Proc. First Intl Workshop on Mining Graphs Trees and Sequences, 2003, pp. 55-64.

T. Asai, H. Arimura, T.Uno and S. Nakano, "Discovering Frequent Substructures in Large Unordered Trees," in procceding sixth conference on Discovery Science, 2003, pp. 47-61.

Y.Chi, Y.Yang, and R. Muntz, "Canonical Forms for Labeled Trees and Their Applications in Frequent Subtree Mining," Knowledge and Information Systems, no. 8.2, pp. 203-234, May 2004.

Chi, Yun, et al., "Frequent subtree mining-an overview," in Fundamenta Informaticae, 2005, pp. 161-198.

Shasha, Dennis, J.Tsong-Li Wang and Sen Zhang. , ""Unordered tree mining with applications to phylogeny," in IEEE Proceedings 20th International Conference on Data Engineering, 2004, pp. 708-719.

M.J.Zaki, "Efficiently Mining Frequent Embedded Unordered Trees," in IOS Press, 2005, pp. 1-20.

Jimenez, Aida, F.Berzal,J.Cubero, ""Mining induced and embedded subtrees in ordered, unordered, and partially-ordered trees," in EEE Transactions on Knowledge and Data Engineering, Springer Berlin Heidelberg, 2008, pp. 111-120.

Jimenez, Aida,F. Berzal Juan-Carlos Cubero., "Mining Different Kinds of Trees: A Tree Mining Overview," in Data Mining, 2006.

B.Bringmann, "Matching in Frequent Tree Discovery," in Fourth IEEE International Conference on Data Mining, 2004.

Chi, Yun, et al. Mining. , "Cmtreeminer: Mining both closed and maximal frequent subtrees," in Advances in Knowledge Discovery and Data , Springer Berlin Heidelberg, 2004, pp. 63-73.

AliMohammadzadeh, Rahman, et al. , "Complete Discovery of Weighted Frequent Subtrees in Tree-Structured Datasets," International Journal of Computer Science and Network Security (IJCSNS ), vol. 6, no. 8, pp. 188-196, Aug. 2006.

J.HU, X.Y.LI, "Association Rules Mining Including Weak-Support Modes Using Novel Measures," WSEAS Transactions on Computers, vol. 8, no. 3, pp. 559-568, Mar. 2009.

Zhao, Peixiang, and J.X.Yu , "Mining closed frequent free trees in graph databases," in Advances in Databases: Concepts, Systems and Applications, Springer Berlin Heidelberg, 2007, pp. 91-102.

Zou, Lei, et al., "PrefixTreeESpan: A pattern growth algorithm for mining embedded subtrees," in Web Information Systems (WISE), Springer Berlin Heidelberg, 2006, pp. 499-505.

Kutty, Sangeetha, R.Nayak, Y.Li, "PCITMiner: prefix-based closed induced tree miner for finding closed induced frequent subtrees," in Proceedings of the sixth Australasian conference on Data mining and analytics, vol. 70, Australian Computer Society, 2007.

Zhao, Peixiang, and J.X.Yu, "Fast frequent free tree mining in graph databases," in springer World Wide Web, Hong kong, 2008, pp. 71-92.


Refbacks

  • There are currently no refbacks.


ISSN: 1694-2507 (Print)

ISSN: 1694-2108 (Online)