Font Size: a A A

No Default Categories For Large Amount Of Data Clustering Algorithm Research

Posted on:2013-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:C LiuFull Text:PDF
GTID:2248330395953119Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the development of computer science and technology, especially computer networks, more and more people are faced with a flood of information. In particular, due to the application of database, a large number of data have been accumulated in various fields and industries. The problem "a wealth of data and poor knowledge" is becoming increasingly prominent. In recent decades, knowledge discovery (rule extraction, data mining, machine learning, etc.) came into being receiving artificial intelligence scholars’extensive attention. A variety of different methods are proposed.The so-called data mining, data is from a large number of disordered in the discovery of hidden, effective, valuable, understandable model, and then discover useful knowledge, and come to the time trends and associations, to provide users with problem solving level decision support capabilities. At the same time, clustering as one of the main methods of data mining, more and more cause for concern. In the knowledge discovery tasks, people often have to face large amounts of data processing tasks, especially with the growing network information and the complex areas such as financial data, medical diagnostics, satellite-data and so on. We are now facing the handling of objects frequently up to millions, tens of millions. The computer’s processing power often appears to lack. Large amounts of data will bring a lot of difficulties in knowledge acquisition methods in knowledge discovery.This article describes the clustering method and its principles, and its limitations and advantages of the analysis, trying to integrate different clustering algorithms ideas, to utilization of the advantages of a particular algorithm, it not only can handle the amount of data, but also can not need to preset the number of categories, through this we can improve the clustering accuracy and reduce the clustering instability. Through theoretical analysis and experimental, the results show that the original AP algorithm can’t solve the problem of the large amount of data. Through the integration of the original AP clustering algorithm and K-Means clustering algorithm, we proposed the KMAP clustering algorithm. Through theoretical analysis and experimental, the new improved KMAP algorithm not only can solve the problem of the original AP clustering algorithm can’t handle large data, and increase its scope of application, but also resolve the K-Means clustering algorithm’s instability which caused by the order of the input dataset. Because of KMAP’s "K" value is not easy to determine, we proposed the KCAP clustering algorithm which is proposed to reduce the "K" value on the KMAP, and through this we can make the KMAP algorithm not need to preset the number of categories.
Keywords/Search Tags:Data Mining, Clustering, AP clustering, K-Means clustering
PDF Full Text Request
Related items