Research Of Clustering Mining Algorithm Oriented Big Data

Posted on:2016-04-12

Degree:Master

Type:Thesis

Country:China

Candidate:Y L Wang

Full Text:PDF

GTID:2308330473965501

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The great potential value of big data prompts big data mining technology to generate, Big Data mining is the data processing which mines valuable knowledge from the data source charactering with volume, velocity and variety. How to accurately and quickly mine valuable knowledge from big data is a hot research topic.This thesis focuses on the research of big data clustering algorithms, the research objective is to improve the accuracy and efficiency of clustering algorithms. Firstly, the accuracy is improved by improving traditional clustering algorithms, and then to improve efficiency through the improved clustering algorithm parallelization.This thesis presents a Density-based Incremental k-means clustering algorithm, named DBIK-means, which bases on DBSCAN algorithm and k-means algorithm. DBIK-means algorithm firstly calculates the density of data points, then combines the center point which has a density greater than a given threshold value and others point which in the density range of the center point to build basic clusters; then merges two basic clusters according to the distance between their center points; finally, divides point which is not belong to any cluster into its nearest cluster. Theoretical analysis and experimental results on KDD CUP 99 dataset show that this algorithm can find clusters of arbitrary shape, and is not sensitive to parameters and the input order of data points. It can get higer clutering accuracy with a little additional time cost. Its overall performance is better than k-means clustering algorithm.In order to improve the efficiency of DBIK-means algorithm, reduce the time complexity of the algorithm, this thesis uses distributed database to simulate shared memory space, and then makes DBIK-means algorithm parallelization in the cloud computing platform of Hadoop; the experimental results show that DBIK-means is suitable for clustering mining of large dataset.Finally, the DBIK-means algorithm is applied to the classification of telecom customers, application result shows that the DBIK-means algorithm can automatically classify a large number of telecommunications customers into several clusters more accurately than traditional clustering algorithm, it’s helpful for telecom operators to develop different marketing strategies for different types of customers.

Keywords/Search Tags:

Big Data, Clustering Mining, K-means, Cloud Computing, Hadoop

PDF Full Text Request

Related items

1	Research On K-Means Clustering Algorithm Based On Hadoop Cloud Computing Platform
2	Research On Data Mining Technology Of Internet Of Things Based On Cloud Computing
3	Clustering Algorithm Based On The Background Of Big Data
4	Research On Key Technologies Of Secure Data Mining Outsourcing In Cloud Computing Environment
5	Research And Design Of Parallel K-prototypes Clustering Algorithm Based On Hadoop
6	Research On Data Mining Algorithm In Cloud Computing Environment
7	Research On Parallel Clustering Algorithm Based On Hadoop Cloud Computing Platform
8	Research And Implementation Of Big Data Analysis And Mining Technology Based On Hadoop In Telecommunications Industry
9	K-Means Algorithm Design And Implementation Based On Hadoop And Mahout
10	Research And Application Of Hadoop Distributed Clustering Mining Method Based On Virtual Machine