Research And Implementation On Fuzzy C-means Algorithm For Big Data In Cloud

Posted on:2015-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:C J Yu

Full Text:PDF

GTID:2298330452450741

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of Internet, interactive applications such asMicroBlog, WeChat and SNS spring up. Data is exploding as cloud-basedapplications on various digital devices rise. Facing with enormous amount of data,traditional data analysis tools can’t mine useful information deeply as it just makessimple processing of the data. So it’s particularly important to excavate valuableinformation from a mass of data. Cluster analysis is one of these big data analyticstechniques. Traditional cluster analysis on stand-alone devices can’t meet thedemands of computational efficiency and complexity in big data analytics. In thiscase, cloud computing provides a new approach to the research of cluster analysis onbig data.In this paper, it makes research on traditional cluster analysis by combiningMapReduce parallel computing model and can make fast and efficient clusteranalysis on big data.The content of this thesis is as follows:(1) Research on methods of big data integrationDiversity is one of the notable features of big data as the types and sources ofdata vary greatly. We need to integrate data from different sources before analysis. Itmakes research on the feature of diversity of big data. It make research on methodsof XML data parsing in a cloud environment by analyzing traditional data integrationsystems based on Web Service and XML. It puts forward a scheme of dataintegration based on Hadoop which can integrate dataset from different sources intoHBase database and can make fast and efficient analysis on the data.(2) Research on Fuzzy-C Means (FCM)Cluster analysis is one of the big data analytics techniques. It makes research onFuzzy-C Means and makes a design to MapReduce.(3) Research on Fuzzy-C Means based on Canopy (Canopy-FCM)It makes research on Canopy algorithm allowing for the feature of high volumeof big data. Canopy is a coarse but fast algorithm which can get a coarse clustering center through few times of iteration. The result by Canopy can be used as the inputof FCM algorithm to accelerate its convergence. It makes research on Fuzzy-CMeans based on Canopy and makes a design to MapReduce.(4) Research on Fuzzy-C Means based on Maximum and Minimum Distance byHash Sampling (HMMFCM)Canopy-FCM is a fast but not accurate clustering algorithm. Traditionalclustering algorithms usually get initial clustering center by maximum and minimumdistance algorithm in order to achieve better effects. As maximum and minimumdistance algorithm can’t be paralleled, it makes research by combining with Hashsampling and puts forward a scheme of MapReduce based on Hash sampling. Thescheme computes the initial clustering center by maximum and minimum distancealgorithm and uses the initial clustering center as the input of FCM algorithm toachieve better clustering effects.

Keywords/Search Tags:

Big Data, Cloud Environment, Data Integration, FCM, MapReduce

PDF Full Text Request

Related items

1	Research And Implementation On Fuzzy C-means Algorithm For Big Data In Cloud
2	Research On MapReduce Secure Data Exchange Based On Trusted Execution Environment Technology
3	Design And Implementation Of Medical Vaccine Big Data System Based On Cloud Computing Environment
4	The Research Of Task Scheduling Algorithm For Mapreduce Framework In Cloud Environment
5	Data Storage Security Technology And Application Based On Cloud Platform
6	Top-k Query Technology Of Massive Uncertain Data In Cloud Environments
7	Data Mining Algorithm Parallelization In Cloud Environment
8	Data Destruction Mechanism For Integration Environment Of Cloud-P2P
9	Research And Implementation Of Local Priority Scheduling Algorithm Based On Mapreduce For Massive Data
10	Study On Genetic Algorithms Based On MapReduce In The Big Data Environment