Present a way to Find Frequent Tree Patterns Using Inverted Index

Saeid Tajedi, Hasan Naderi

Abstract


In recent years, data mining in trees or tree mining have attracted much attention due to explosive growth in generating tree databases. The tree database is one type of database that consists of either a single large tree or a number of relatively small trees. Among all patterns occurring in tree database, mining frequent tree is of great importance. The frequent tree is the one that occur frequently in the tree database. Frequent subtrees not only are important themselves but are applicable in other aspects of data analyze and data mining tasks, such as similarity search in tree database, tree clustering, classification, bioinformatics, etc. In this paper, after reviewing different methods of searching for frequent subtrees, a new method based on inverted index is proposed to explore the frequent tree patterns. This procedure is done in two phases: passive and active. In the passive phase, we find subtrees on the dataset, and then they are converted to strings and will be stored in the inverted index. In the active phase easily, we derive the desired frequent subtrees by the inverted index. The proposed approach is trying to take advantage of times when the CPU is idle so that the CPU utilization is at its highest in in evaluation results. In the active phase, frequent subtrees mining is performed using inverted index rather than be done directly onto dataset, as a result, the desired frequent subtrees are found in the fastest possible time. One of the other features of the proposed method is that, unlike previous methods by adding a tree to the dataset is not necessary to repeat the previous steps again. In other words, this method has a high performance on dynamic trees. In addition, the proposed method is capable of interacting with the user.


Keywords


Tree Mining, Inverted Index, Frequent pattern mining, tree patterns

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


ISSN: 1694-2507 (Print)

ISSN: 1694-2108 (Online)