Font Size: a A A

K-medoids Clustering Algorithm Web Informantion Integration Research And Implementation

Posted on:2012-02-02Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q WangFull Text:PDF
GTID:2178330332483713Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the continuous development of information society, people has increasingly demand for information. Web site has a wealth of information resources, which has gradually become an important way for people to obtain information. However, build a web information integration system is necessary due to different sites use different data formats, the research of the improved clustering algorithm is the hot issues in the process of the implementation of the Web information integration system, it can be more convenient for users inquire information on different website. In order to achieve this goal, we first cluster data which from different sites. Clustering method means cluster immense amounts of data into several categories. Through clustering, we can select central point of each clustering from each categories, and each central point represents each clustering. the new data only for comparison with the clustering center , no longer compare with each data point. This can effectively reduce the complexity of integration, which has great significance.Based on the analysis and research of the research achievements in recent years, this paper improves the traditional clustering method compare with the defects of the existing clustering methods for low accuracy and low efficiency, aim to improve the efficiency with the high accuracy.This paper has the following aspects:(1)Improves the traditional K-medoids clustering algorithm. In the improving process, mainly for optimizing selection methods of the initial cluster center, effectively reduce the possibility of the adjacent data objects becomes the initial cluster center at the same time in the process of the implement of the new selection method, and reducing the clustering results in the process of tracing the final calculation of the number of iterations, reducing the complexity of the clustering process, thus effectively improving the efficiency of clustering. At the same time set the radius in the process of the clustering, so we can weed out irrelevant data.(2)Combines the improved K-medoids clustering algorithm with Web integration technology. Designing the parser, extractor, and data integration module based on the basic steps of the Web information integration, thus effectively applied cluster algorithm into the Web information integration technology.(3) According to the above two research results, improves the traditional method of web data integration bases on the combination with data integration technologies, HTML related knowledge, JAVA program design, XML technology and the similarity calculation method to provide users with a fast, convenient, accurate and efficient data integration methods, and makes it with excellent practicability.
Keywords/Search Tags:web data integration, clustering algorithm, analysis, extraction
PDF Full Text Request
Related items