Font Size: a A A

XML Documents Clustering Based On Density Method

Posted on:2010-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:D LuoFull Text:PDF
GTID:2178360275468644Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the further development of social informationization, requirements toward information and the dependants on information become higher and higher.How to gain efficient and useful information from numeric information materials will be the research spotlight.Currently,the main text clustering methods are Partitioning Clustering Method,Hierarchical Clustering Method,Self-Organizing Mapping Method,Text Clustering Method based on Genetic Algorithm, etc.As XML documents is Half-Structure text,and its semantic information can be described via documents structure.Thus,not all the text clustering algorithm is available for XML documents clustering.The populated applied XML text clustering methods are Partitioning Clustering Method and Hierarchical Clustering Method.The disadvantages of these two methods are that they are confined to find globular clustering type,while,they can't be implemented effectively for irregular and random clustering types.XML,as a general data interchange carrier,has diversity in its text structure among numeric data storage.So it is necessary to utilize a new clustering method to realize its clustering.Moreover,clustering models of traditional text semantic clustering fields,Vector Space Model,Boolean Model,Probability Model, Set Operations,Support Vector Machines and Latent Semantic Indexing Model,etc.,they are indexed by word frequency as characteristic item in documents set.Structure hierarchy lied in word are ignored.Thus,they can't manage the text clustering effectively on structure nesting XML.A new structure similarity clustering algorithm based on DBSCAN is proposed based on the above clustering methods,which can be used for finding irregular and random clustering type.Meanwhile,research on the "Structure Nesting" characteristic of XML documents set is put,a new XML delamination semantic clustering method is also put forward,which view the hierarchical level that the key words lied in as an important factor to realize a new semantic clustering algorithm.Besides,Rough but not complete matching is operated in semantic comparison.Compared with traditional documents clustering technology,this method can run clustering about XML on semantic level more effectively.
Keywords/Search Tags:XML, XML Clustering, Dissimilarity Measurement
PDF Full Text Request
Related items