A New Online XML Document Clustering Based on XCLS++
Abstract
Many methods have been proposed to XML document clustering. These methods can be divided into three categories: structure-based, content-based and hybrid methods. XCLS++ is one of the most effective and efficient algorithms to XML document clustering which fit into the structural clustering category. Because of its efficiency, XCLS++ can be used XML stream clustering. In this paper, we will show one of the weaknesses of this method and then we will try to solve it by deleting a factor in the XCLS++formula. As we will show, this factor is related to the node weight in a tree which represents a given XML document. According to our experimentations which have been presented in this paper, the effectiveness (in term of accuracy) and efficiency (in term of execution time) of XCLS++ can be improved once this weight factor is eliminated from the original XCLS++ formula.
Keywords
Full Text:
PDFReferences
Ilwan Choi, Bongki Moon, Hyoung-Joo Kin, (2006), A clustering method based on path similarities of XML data, Data &
Knowledge Engineering.
Andrewdn, jag, (2007), Information systems engineering, Evaluating Structural Similarity in XML Document WISE’07
Proceedings of the 8th international conference on Web Information.
Tien Tran, Richi, Peter, (2008 (, Data Mining, Combining Structure and Content Similarities for XML Document
Clustering, Conference 27-28 November, Glenelg, South Australia.
WOOSAENG KIM, (2008( , Computer Engineering and Applications, XML document similarity measure in terms of the
structure and contents, CEA'08 Proceedings of the 2nd WSEAS International Conference.
G.R.Nayak, (2008), Fast and effective clustering of XML data using structural information,“ knowl. Inf. Syst.
The Wisconisn’s XML data bank. Accessed from: http://www.cs.wisc.edu/hiagara/data.html Cited 2012.
The XML data repository. Accessed from: http://www.cs.washington.edu/research/xmldatasets/. Cited 2012.
Waraporn Viyanon, Sanjay K.Madria, Sourav S.Bhowmick, (2008), Management of Data, XML Data Integration Based
on Content and Structure Similarity Using Keys.
aptarshi Ghosh and Pabitra Mitra, (2008), Pattern recognition, ICPR Combining Content and Structure Similarity for
XML Document Classification using Composite SVM Kernels, , 19th International Conference.
jing PengDong Qing Yang Shi Wei Tang et al, (2008), similarity in chinese text processing, A New Similarity
competing method based on concept, series F: Information science, 51(9): P1212-1230,
Mohamad Alishahi, Mohmoud Naghibzadeh and Baharak Shakeri Aski, (2010), International Journal of Computer
and Electrical Engineering ,Tag Name Structure-based Clustering of XML Documents, VOL. 2, NO. 1, February.
Ahmad khodayar and Hassan naderi, (2012), International Journal of Information Technology, Control and Automation
(IJITCA), XCLS++: A new algorithm to improve XCLS+ for clustering XML documents, Vol.2, No.4
Refbacks
- There are currently no refbacks.
ISSN: 1694-2507 (Print)
ISSN: 1694-2108 (Online)