Font Size: a A A

The Research And Implementation Of The Algorithms Of Massive Data Processing In Data Mining

Posted on:2013-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:M M MiaoFull Text:PDF
GTID:2248330362472727Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data Mining is a process to extract interesting and useful knowledge from datasets.And following the development of Internet and database technology, processinghuge data sets has been an important topic in data mining.The paper takes the project of "the design and development of telecommunicationsdata mining system "as research background. With the research of massive dataprocessing technology, the paper put forward a way of deal with massive data usingmemory-mapped file. Through the research of massive data mining algorithms andtheoretical knowledge of granular computing, this paper presented two kinds of datamining algorithms: one is based on matrix compression Apriroi algorithm(MC-Apriori).The algorithm is the improvement of classic Apriori algorithm, it madetransactional data into0-1matrix and repeated compression matrix in accordance withthe Apriori nature and its corollary, and then get the frequent itemsets,this algorithm isalso to some extent reduced the data size and computation. The other is association rulesmining algorithm based on granular computing (Grc-AR).this algorithm is theintroduction of the idea of granular computing on the basis of the MC-Apriori algorithm.It made massive data set divided into several small data sets, and then operated on smalldata sets, and the results were integrated to get the final result. And this paper madesome mobile user calling records for data sample, achieved the two algorithms,analyzed and compared experimental results.Experimental results show that the two algorithms did not lose the effectiveness ofthe algorithm, but in the actual processing of massive data, Grc-AR is more suitable fordealing with massive data sets, the more algorithm scalability. Finally, from the mass of the telecommunications data set, extract some valuable information, and providedecision support to telecom operators.
Keywords/Search Tags:data mining, massive data processing, MC-Apriori algorithms, granularcomputing, Grc-AR algorithms
PDF Full Text Request
Related items