Font Size: a A A

XML Clustering Ensemble Research

Posted on:2015-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:Y JiangFull Text:PDF
GTID:2428330488499503Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
XML clustering ensemble can offer a partition that could better reflect the inherent structure of the data set by studying and integrating many clustering results of the original data set with the traditional clustering ensemble algorithm,so it is more than a single XML clustering algorithm can handle noise data and outlier data to get a good clustering effect and become a hot topic of current research.XML clustering ensemble is divided into aggregation of small-scale XML clustering and large-scale XML slustering,as most clustering ensemble algorithms are integrated for small-scale data,these algorithms applied to large-scale data clustering ensemble does not get better clustering effect,therefore,the paper studies the small-scale clustering ensemble at the same time,especially for the more in-depth study of large-scale XML clustering ensemble.In order to achieve the purpose of XML clustering ensemble,in the study of clustering ensemble before,the primary task to choose a good XML document similarity calculation method,XML document similarity calculation method is designed by this paper with the traditional method of similarity calculation in contrast,choose the paper design of similarity calculation method has higher accuracy and precision,thus XML document similarity caculation method.are selected in this paper.Followed small-scale XML clustering ensemble algorithm based on quantum genetic algorithm is put forward,it could better improve the quality and accuracy of clustering than two kinds of single clustering algorithm are proposed in this paper.Finally,the mount of data from MB to GB and TB and complicated structure,uneven distribution and noise more,we proposed the XML big data clustering ensemble solution of parallel AP transmission.The method first for each of the large XML data cleaning,classification and extraction,and extract subtrees from division sub-set,and designed parallel random subspace classifier for training to the extraction subtrees,multiple classifiers of training subset with different characteristics were obtained.Then inline similarity matrix was get from the relationship of each training subset of designed classifier,and solve matrix eigenvalue of corresponding eigenvectors according to the improving parallel lanczos-QR algorithm to achieve the high dimensional data reduction and low dimensional embedding.Then combining theory of system energy,we designed AP algorithm that based on the system energy.And the optimal energy clustering combination of sample sets were realized by using this algorithm to complete clustering integration.In order to validate the proposed algorithm,the experimental results showed that the clustering integration algorithm has better clustering effect than other clustering algorithm for XML big data set.
Keywords/Search Tags:XML big data set, random subspace classifier, parallel lanczos-QR algorithm, system energy, parallel affinity propagation algorithm
PDF Full Text Request
Related items