No Default Categories For Large Amount Of Data Clustering Algorithm Research

Posted on:2013-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:C Liu

Full Text:PDF

GTID:2248330395953119

Subject:Education Technology

Abstract/Summary:

With the development of computer science and technology, especially computer networks, more and more people are faced with a flood of information. In particular, due to the application of database, a large number of data have been accumulated in various fields and industries. The problem "a wealth of data and poor knowledge" is becoming increasingly prominent. In recent decades, knowledge discovery (rule extraction, data mining, machine learning, etc.) came into being receiving artificial intelligence scholarsâ€™extensive attention. A variety of different methods are proposed.The so-called data mining, data is from a large number of disordered in the discovery of hidden, effective, valuable, understandable model, and then discover useful knowledge, and come to the time trends and associations, to provide users with problem solving level decision support capabilities. At the same time, clustering as one of the main methods of data mining, more and more cause for concern. In the knowledge discovery tasks, people often have to face large amounts of data processing tasks, especially with the growing network information and the complex areas such as financial data, medical diagnostics, satellite-data and so on. We are now facing the handling of objects frequently up to millions, tens of millions. The computerâ€™s processing power often appears to lack. Large amounts of data will bring a lot of difficulties in knowledge acquisition methods in knowledge discovery.This article describes the clustering method and its principles, and its limitations and advantages of the analysis, trying to integrate different clustering algorithms ideas, to utilization of the advantages of a particular algorithm, it not only can handle the amount of data, but also can not need to preset the number of categories, through this we can improve the clustering accuracy and reduce the clustering instability. Through theoretical analysis and experimental, the results show that the original AP algorithm canâ€™t solve the problem of the large amount of data. Through the integration of the original AP clustering algorithm and K-Means clustering algorithm, we proposed the KMAP clustering algorithm. Through theoretical analysis and experimental, the new improved KMAP algorithm not only can solve the problem of the original AP clustering algorithm canâ€™t handle large data, and increase its scope of application, but also resolve the K-Means clustering algorithmâ€™s instability which caused by the order of the input dataset. Because of KMAPâ€™s "K" value is not easy to determine, we proposed the KCAP clustering algorithm which is proposed to reduce the "K" value on the KMAP, and through this we can make the KMAP algorithm not need to preset the number of categories.

Keywords/Search Tags:

Data Mining, Clustering, AP clustering, K-Means clustering

Related items

1	Research On Dynamic Clustering And Incremental In Data Mining
2	No Default Categories For Large Amount Of Data Clustering Algorithm Research
3	Research On Ensemble-Initialized K-Means Clustering Algorithms
4	Research Of Improving For K-means Clustering Algorithm
5	A Deep Embedding Clustering Algorithm Considering Preservation Of Initial Clustering Structure And Its Application
6	Study Of Auto-Adaption Fuzzy C-Means Clustering Algorithm
7	Research And Improvement Of K - Means Clustering Algorithm
8	Ant Clustering Algorithm With K-harmonic Means Clustering
9	Research On Fuzzy Clustering Analysis In Data Mining
10	Clustering Data Mining Applications In Department Store And K-means Clustering Algorithm Improvement