Font Size: a A A

Research Of Clustering Mining Algorithm Oriented Big Data

Posted on:2015-02-03Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2298330467474537Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development and application of global information technology, Big Data era hasarrived, and big data mining technology has emerged. Big data mining is the technology whichhelps people mining really needed and valuable knowledge from big data, which are massive,rapidly arrived, and heterogeneous.This thesis focuses on the research of big data clustering algorithms. For the purpose to improvemining efficiency, this thesis does research of big data mining algorithms aiming at two charactersof Big Data such as big data sets and high-speed stream data. The research is not only improvingthe algorithm itself, but also the algorithm parallelization based on cloud computing.For the purpose to improve the accuracy of high-speed stream data clustering, this thesisimproves StrAP algorithm to the algorithm ISTRAP which is based on the model of sliding timewindow, then demonstrates the performance of the algorithm by simulation results. Furthermore thisthesis designs a hierarchical data stream clustering algorithm HSCLUSTER based on ISTRAPalgorithm to meet the need of analyzing historical data and its evolution, the simulation resultsreveal the good performance of the algorithm.For purpose to improve the efficiency of big data sets clustering, this thesis proposed a parallelweighted AP clustering algorithm P-WAP, which is based on Hadoop platform of cloud computing.In order to verify the efficiency of the P-WAP algorithm, this thesis designed experiments based onHadoop, the simulation results show that P-WAP algorithm can apply to mining clustering big datasets.The thesis also applies P-WAP algorithm to text clustering. The clustering result of runningP-WAP algorithm on the experimental text set has shown that large amounts of text can beautomatically classified by P-WAP algorithm, so that the texts can be clustered more accurately, andthe use and management of text set become more efficient and convenient.The research fits big data’s characteristics of volume, velocity and vitality. The research isadvanced and the results have good theoretical and practical value. The research results can be usedfor E-commerce, Internet of Things and other applications that own big data.
Keywords/Search Tags:Big Data, Stream Data, Clustering Mining, Text Clustering, Could Computing, Hadoop
PDF Full Text Request
Related items