Font Size: a A A

Research Of The XML Document Clustering Using GA

Posted on:2015-01-31Degree:MasterType:Thesis
Country:ChinaCandidate:W W PanFull Text:PDF
GTID:2268330428461661Subject:Computer technology
Abstract/Summary:PDF Full Text Request
XML is a kind of Metadata Makeup Language which widely used in many fields in these years. With the explosive growth of the number of XML document, people urgent need to obtain knowledge and information from these documents. XML Document Minding has become the research focus at present. And in the minding of XML document, the auto-clustering of XML document is a new research field, it not only could strengthen the organization of XML documents in the net, but also could provide a effective technical support for the collection, arrangement and retrieving of documents. Simultaneously, it can find out the unknown knowledge and the connection of XML documents from huge numbers of data. So, the clustering of XML document has a great significance of research. K-medoids is a kind of simple rapid clustering algorithm, with which the clustering will be very convenient. There are two problems in traditional K-medoids clustering. The first one is the number of clustering, and another is the selection of initial clustering center. Against the two problems this paper provides different solution respectively. For the number of clustering, this time we use fuzzy clustering, with the application of based Fuzzy Equivalence Relation Matrix method, then under the different thresholds, we used evaluation function to get the best result and ensure the number of clustering. In K-medoids clustering, the selection of initial clustering center is a very important step in the process of clustering. In this paper, based on the ability of genetic algorithm’s global optimal solution, we use the genetic algorithm toappliedto K-medoids clustering. Then the cluster centers is using as the individuals of genetic algorithms, andthen we use the clustering evaluation function as a fitness function in the global scope of search for the best of the cluster center, in the end to determine the optimal clustering results. Finally, the analysis of experimental results indicates the effectiveness and practicality of the proposed method.
Keywords/Search Tags:XML document clustering, Genetic Algorithm, Fuzzy Clustering, K-medoids clustering
PDF Full Text Request
Related items