Font Size: a A A

Research On Data Mining Technology In Hadoop Platform

Posted on:2015-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2298330431985571Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Nowadays in the big data era, people are overwhelmed with the sheer volume ofinformation. According to the statistics released by the authority in2011, the total amount ofglobal data every two years will be doubled, and it is expected that the amount of data humanbeings have in2020will reach a staggering35trillions GB. Facing with such vast oceans indata, how to extract valuable information using data mining techniques highlights strongvitality. Through data mining, users can extract potential useful information, rules, high levelinformation from huge, random, vague and noisy data sets, thus it can be of great importanceto the field of scientific studies, business decisions, etc.While big data brings great opportunities, it also presents challenges for effective datamanagement and utilization. The emergence of cloud computing provides new ways toaddress the se concerns. Cloud computing is service based and can provide computing patternwith dynamic scalable virtualized resource. It is also able to efficiently seek out usefulinformation in the large amount of data, thus making plenty of new applications flourish inthe cloud environment. Utilizing its advantages in distributed processing and virtualization,this paper conducts a study in the following three aspects.The first aspect deals with the defects in the traditional Apriori algorithm of associationrules. Based on the column-oriented database called HBase, this paper presents a noveldistributed algorithm of association rules mining,(MCM-Apriori), which associates theMap/Reduce programing model with coding operation. This can quickly find out accuraterelations among knowledge models. Further, the two times of Map/Reduce processes greatlyreduce the running time of MCM-Apriori, making it accurate and efficient.Additionally, facing with new requirements for big data management, the paper putsforward a fast lookup algorithm of mix hash. It is based on engine in key-value in-memorydatabase Redis and technology of Cuckoo hash. By building up public an overflow area andusing the method of shift keying, the query respond time is reduced and searching efficiencyis improved.Finally, an online bookstore sales system has been designed and implemented under theHadoop framework of cloud computing. Using the improved MCM-Apriori algorithm and thefast lookup algorithm CSR_Hash, it parses and recommends book data in real-time andhigh-efficiency. This achieves fast query and analysis, and data-storage reliability, and shows great advantages of NoSQL database combining with Map/Reduce in real-time andhigh-concurrency.
Keywords/Search Tags:Hadoop, key-value storage, Map/Reduce, Apriori, Redis database
PDF Full Text Request
Related items