Font Size: a A A

Research Of Clustering Algorithm Based On Web Text Mining

Posted on:2013-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y K YangFull Text:PDF
GTID:2248330392454332Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of computer and Internet technology, data resources are becoming rich,but the knowledge hidden in large amounts of data resources did not be get the full advantage. Webmining can obtain useful information on the Web quickly and efficiently. Because the informationon the Web is mainly expressed in the form of text, text clustering as an important branch of textmining can be better hidden in the text data category features. Therefore web text clustering analysishas important practical value. Currently, the research has focused on the improvement of a singleclustering algorithm and discuss the relevant parameters in these two areas. However, a singleclustering result of instability, the randomness of the problem, existing studies tend to integratemultiple clustering results. Use of ensemble learning technologies to improve the clusteringperformance has become an emerging research focus. The focus of this study is the ensembleclustering method.This paper introduces the research background and research status, and expounded the relatedtheories and key technologies of text clustering and clustering ensemble. Text representation, featureselection methods, the similarity measure text pretreatment technology, and the consensus functiondesign method conducted in-depth discussion. Now, the text clustering ensemble mostly do notconsider the quality of the integrated cluster members, and some members of poor quality or noisewill affect the final combination results. After researching and analysis of individual algorithms andintegrated clustering algorithm, for the deficiencies of the existing clustering ensemble algorithm,the paper proposes a weighted clustering ensemble algorithm. The main idea of this algorithm isweighing the fusion members by calculating the comprehensive clustering quality and the differencedegree of the various members, then, the fusion results will be get better.Finally, we designed a prototype of a text clustering and the weighted ensemble algorithmapplied in the text. Of this experiment, weighted ensemble algorithm WCSCE and not weightedensemble algorithm CSCE as well as a single K-means algorithm were compared to verify thefeasibility and effectiveness of the weighted algorithm.
Keywords/Search Tags:Data Mining, Text Clustering, Clustering Ensemble, Weight Designing
PDF Full Text Request
Related items