Research And Implementation Of Big Data Analysis And Mining Technology Based On Hadoop In Telecommunications Industry

Posted on:2017-12-20

Degree:Master

Type:Thesis

Country:China

Candidate:H J Cui

Full Text:PDF

GTID:2348330518995646

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology,data generated in the rapidly expanding scale,facing of such vast amounts of data,data mining technology will be developed.Big data brought both challenges and opportunities,how to dig out the useful information from such a large amount of data,is a challenging task.There are a lot of customer data in the communications industry,the use of large data related to technical analysis,dig out the potential of knowledge in order to improve the service experience is a meaningful task.This paper work done in this context as follows:First,in terms of algorithms,this paper use clustering algorithm to customer segmentation,use a decision tree algorithm to customer forecasts.The traditional K-means algorithm need to enter the number of clusters,but facing such huge amounts of data,it is not clear of data distribution,which makes it difficult to use this algorithm.To solve these problems,K-means clustering algorithm in this paper improve implementation,a DGK-means algorithm to calculate the most appropriate number of clusters using genetic algorithm,using density of thought to calculate fitness function,improve efficiency and accuracy of the algorithm.C4.5 decision tree algorithm use the test data set to build a decision tree model,use the data model to predict unknown results to achieve customer forecasting and customer retention goals.Secondly,according to the needs of large data mining analysis,using Hadoop platform for big data analysis and mining,this paper designs and implements a Hadoop-based big data analysis of the communications industry mining system,which use HDFS distributed storage of data and the MapReduce to parallel computing.Clustering algorithm and decision tree algorithm were designed in parallel.The distributed data storage in Hadoop platform for parallel computing provides convenient,but also the decision tree algorithm using parallel computing design reduces pruning,improve the efficiency and accuracy of the algorithm.Finally,the use of test data sets for performance systems and algorithms were validated,accuracy and efficiency show DGK-means algorithm have been improved compared to traditional algorithms;in the case of parallel computing,efficiency has been improve when cluster node number is greater than 2,and with the increasing of the number of nodes in the cluster efficiency improvement is more obvious.

Keywords/Search Tags:

data mining, Hadoop, K-means, decision tree algorithms, parallel computing

PDF Full Text Request

Related items

1	The Research Of Decision Tree Mining Based On Hadoop
2	The Parallel Reseach On Decision Tree Classification Algorithm Based On Hadoop
3	The Research On Decision Tree Algorithm's Parallelization Based On Hadoop Platform
4	Parallel Data Mining Algorithms Research Of Hadoop
5	Research On Parallel Shared Decision Tree Algorithm Based On Hadoop
6	The Research On Data Mining Algorithmâ€™s Paralleliation Based On Hadoop2.0
7	Hadoop-based Parallel Algorithm For Mining
8	Parallel Research And Application Of Machine Learning Algorithm Based On Cloud Platform
9	Decision Tree Classification Algorithm Parallelization And Its Application
10	Research On Parallel Decision Tree Algorithm Based On Hadoop Platform