A New Online XML Document Clustering Based on XCLS++

Ahmad Khodayar E Qaramaleki, Hassan Naderi

Abstract


Many methods have been proposed to XML document clustering. These methods can be divided into three categories: structure-based, content-based and hybrid methods. XCLS++ is one of the most effective and efficient algorithms to XML document clustering which fit into the structural clustering category. Because of its efficiency, XCLS++ can be used XML stream clustering. In this paper, we will show one of the weaknesses of this method and then we will try to solve it by deleting a factor in the XCLS++formula. As we will show, this factor is related to the node weight in a tree which represents a given XML document. According to our experimentations which have been presented in this paper, the effectiveness (in term of accuracy) and efficiency (in term of execution time) of XCLS++ can be improved once this weight factor is eliminated from the original XCLS++ formula.


Keywords


Clustering, XML documents, XCLS++, Structure similarity, content similarity,

Full Text:

PDF

References


Ilwan Choi, Bongki Moon, Hyoung-Joo Kin, (2006), A clustering method based on path similarities of XML data, Data &

Knowledge Engineering.

Andrewdn, jag, (2007), Information systems engineering, Evaluating Structural Similarity in XML Document WISE’07

Proceedings of the 8th international conference on Web Information.

Tien Tran, Richi, Peter, (2008 (, Data Mining, Combining Structure and Content Similarities for XML Document

Clustering, Conference 27-28 November, Glenelg, South Australia.

WOOSAENG KIM, (2008( , Computer Engineering and Applications, XML document similarity measure in terms of the

structure and contents, CEA'08 Proceedings of the 2nd WSEAS International Conference.

G.R.Nayak, (2008), Fast and effective clustering of XML data using structural information,“ knowl. Inf. Syst.

The Wisconisn’s XML data bank. Accessed from: http://www.cs.wisc.edu/hiagara/data.html Cited 2012.

The XML data repository. Accessed from: http://www.cs.washington.edu/research/xmldatasets/. Cited 2012.

Waraporn Viyanon, Sanjay K.Madria, Sourav S.Bhowmick, (2008), Management of Data, XML Data Integration Based

on Content and Structure Similarity Using Keys.

aptarshi Ghosh and Pabitra Mitra, (2008), Pattern recognition, ICPR Combining Content and Structure Similarity for

XML Document Classification using Composite SVM Kernels, , 19th International Conference.

jing PengDong Qing Yang Shi Wei Tang et al, (2008), similarity in chinese text processing, A New Similarity

competing method based on concept, series F: Information science, 51(9): P1212-1230,

Mohamad Alishahi, Mohmoud Naghibzadeh and Baharak Shakeri Aski, (2010), International Journal of Computer

and Electrical Engineering ,Tag Name Structure-based Clustering of XML Documents, VOL. 2, NO. 1, February.

Ahmad khodayar and Hassan naderi, (2012), International Journal of Information Technology, Control and Automation

(IJITCA), XCLS++: A new algorithm to improve XCLS+ for clustering XML documents, Vol.2, No.4


Refbacks

  • There are currently no refbacks.


ISSN: 1694-2507 (Print)

ISSN: 1694-2108 (Online)