Research On Clustering Algorithms In Data Mining

Posted on:2015-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:K Pei

Full Text:PDF

GTID:2298330467463514

Subject:Signal and Information Processing

Abstract/Summary:

PDF Full Text Request

Data Mining is one of the most active branches in the research of database technology, and the most promising technology in the field of computer science. It is born with the need of mining useful knowledge from massive amounts of data. Data Mining is the procedure of extracting hidden and potentially useful patterns and rules from large data sets. It covers the knowledge of statistics, machine learning, neural networks, pattern recognition, information retrieval, artificial intelligence, and visualization and many other subjects, brings together variety of data analysis techniques.Cluster analysis is an important area in data mining research. Cluster analysis is an unsupervised learning process. By clustering process, we can divide data into multiple classes according to certain rules without prior knowledge, and discover the hidden patterns. The basic clustering algorithms can be divided roughly into several kinds, including partitioning methods, hierarchical methods, density-based methods, grid-based methods and so on. Cluster analysis has a wide range of applications in e-commerce, market analysis, document classification, biology and many other fields.In this paper, clustering techniques in data mining were analyzed and discussed. First of all we briefly introduced the concept of data mining and common techniques. Then according to the classification of clustering algorithms, we systematically introduced each kind of clustering algorithms and typical algorithms. Then a detailed analysis of k-means, a common classical clustering algorithm, was given, including its process, defects and some improvement ideas. We introduced canopy k-means, a hybrid clustering method, which aims to find the initial cluster centers of traditional k-means algorithm, and conducted experiments to test its performance. After that we briefly introduced the Hadoop distributed platform, proposed a parallel strategy of canopy and k-means algorithms. Finally, we presented a parallel clustering algorithm for community mining in social networks, a widely used realistic application, and tested its performance. Experimental results show that compared with the traditional k-means algorithm and canopy k-means, the proposed algorithm has a greatly improvement in efficiency.

Keywords/Search Tags:

Data Mining, Clustering, k-means, Hadoop, Social, Network

PDF Full Text Request

Related items

1	Research And Application Of Hadoop Distributed Clustering Mining Method Based On Virtual Machine
2	Research Of Clustering Mining Algorithm Oriented Big Data
3	The Research And Implement Of Data Mining Algorithms Based On Hadoop
4	The Research And Application Of Security Log Clustering Mining Algorithm Based On Hadoop Platform
5	Research On Mining Taxi Pick-up Hotspots Area Based On Big Data Hadoop Platform
6	Research On Machine Learning Clustering Algorithms In The Hadoop Development Environment
7	Research And Application Of Data Mining Algorithms Based On Hadoop
8	Scmi-superviscd K-means Clustering Algorithm In Data Mining
9	Study On Key Techniques Of Distributed Data Mining Based On Hadoop
10	Research And Implementation Of Big Data Analysis And Mining Technology Based On Hadoop In Telecommunications Industry