Font Size: a A A

Based Clustering Algorithm And Its Application To Obtain A Representative Point

Posted on:2014-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y X ZhangFull Text:PDF
GTID:2268330401486000Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important function of data mining and a very active research topic. Clustering analysis can be used as an independent tool to find data’s global distribution model, also can be used as a preprocessing step of other algorithms to be widely applied to used in other research fields. In the Internet, biological information, e-commerce and many other areas, people can use the clustering analysis method to help solve problems.The development of data acquisition and storage technology has led to large data increasing rapidly, however simple statistical techniques and traditional data mining technology has not enough to solve some increasingly complex data problems, especially involving massive dataset problems. This paper explores analysis and management methods,which can effectively deal with massive datasets and large data, from the perspective of cluster analysis, and mainly includes the following contents:In view of some datas with class information in a actual dataset, we proposed two-stage semi-supervised clustering algorithm based on affinity propagation(2SAP). K-th nearest neighbor graph is used to express the local information of data, and a small amount of prior pair constraint information is used adjustment similarity matrix. Semi-supervised clustering based on affinity propagation (SAP) was used twice to cluster the whole dataset and obtain the final clusters.In view of the characteristics that different categories overlap in actual datasets, we optimized the existing SRIDHCR algorithm. A new algorithm was designed to get the initial center and the initial boundary representative point set, and combin them as SRIDHCR’s initial representative point set. This method can greatly reduce the operation time of SRIDHCR.In additon, the experimental data proved that represent point algorithm can be applied to bioingormatics, text categorization and some othe fields.
Keywords/Search Tags:data mining, clustering, dimensionality reduction, promoter recognition, textcategorization
PDF Full Text Request
Related items