XML Documents Clustering Based On Density Method

Posted on:2010-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:D Luo

Full Text:PDF

GTID:2178360275468644

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the further development of social informationization, requirements toward information and the dependants on information become higher and higher.How to gain efficient and useful information from numeric information materials will be the research spotlight.Currently,the main text clustering methods are Partitioning Clustering Method,Hierarchical Clustering Method,Self-Organizing Mapping Method,Text Clustering Method based on Genetic Algorithm, etc.As XML documents is Half-Structure text,and its semantic information can be described via documents structure.Thus,not all the text clustering algorithm is available for XML documents clustering.The populated applied XML text clustering methods are Partitioning Clustering Method and Hierarchical Clustering Method.The disadvantages of these two methods are that they are confined to find globular clustering type,while,they can't be implemented effectively for irregular and random clustering types.XML,as a general data interchange carrier,has diversity in its text structure among numeric data storage.So it is necessary to utilize a new clustering method to realize its clustering.Moreover,clustering models of traditional text semantic clustering fields,Vector Space Model,Boolean Model,Probability Model, Set Operations,Support Vector Machines and Latent Semantic Indexing Model,etc.,they are indexed by word frequency as characteristic item in documents set.Structure hierarchy lied in word are ignored.Thus,they can't manage the text clustering effectively on structure nesting XML.A new structure similarity clustering algorithm based on DBSCAN is proposed based on the above clustering methods,which can be used for finding irregular and random clustering type.Meanwhile,research on the "Structure Nesting" characteristic of XML documents set is put,a new XML delamination semantic clustering method is also put forward,which view the hierarchical level that the key words lied in as an important factor to realize a new semantic clustering algorithm.Besides,Rough but not complete matching is operated in semantic comparison.Compared with traditional documents clustering technology,this method can run clustering about XML on semantic level more effectively.

Keywords/Search Tags:

XML, XML Clustering, Dissimilarity Measurement

PDF Full Text Request

Related items

1	XML Documents Clustering Based On Density Method
2	Clustering Algorithm Of Missing Data Based On Dissimilarity Measure
3	Research On Clustering Algorithm For Heterogeneous Objects Based On Information Dissimilarity And Irregular Grid
4	Research On K-modes Clustering Algorithm Of Dissimilarity Measure
5	Choosing a dissimilarity representation for classification
6	Research On Ant Colony Clustering Algorithm Based On LF
7	Clustering in relational data and ontologies
8	The Research On Clustering Algorithm For Categorical Data Using Quantum Mechanics
9	Studies On Clustering Algorithms For Categorical Data
10	Research On Data-driven BPA Generation And Conflict Measurement Methods