Font Size: a A A

Key Technology Research-based The Hadoop Of Massive Data Processing

Posted on:2014-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:B CheFull Text:PDF
GTID:2268330401964426Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Internet has experienced since its birth nearly half a century, has penetrated intoevery aspect of our lives. WEB2.0era after the Internet is to the third generation-network service personalization. Lead to a sea of data quantify the personalized theevolution of the Internet, the face of such a sea quantifiable data of traditional singlesuper-server gradually appeared to be inadequate, the processing of vast amounts ofdata is becoming a thorny issue. Generation and processing of huge amounts of data is achallenge and an opportunity, massive data provides a rich source of data for datamining, dig out information more commercial value. Based on the above, the processingof vast amounts of data has become a popular technology for major Internet companiesto invest in research, SMEs are vying to put into this feast.The concept of cloud computing by Google in2006pointed out the direction forthe processing of massive data developed by the Apache Foundation open sourceHadoop open source cloud platform features but also for the majority of researchersbrought the dawn of the low-cost mass data processing. The superior performance oftraditional data processing methods and techniques in the case of a single server, but theface of the cloud platform distributed processing mode no longer adapt. Traditional dataprocessing method for the transformation of distributed computing mode, and on thisbasis algorithm improved performance improvements will have major significance formassive data processing.Firstly, starting from the theory of cloud computing, cloud computing course ofdevelopment of the technology system, and the existing variety of cloud computingplatform to do the analysis and comparison. After the selected Hadoop open sourcecloud platform as the basis of the project research platform, its in-depth analysis anddiscussion. Next, the existing data processing techniques discussed.Article Third of four parts is the core content of the project study, the main content:1) The third part of the massive web log data preprocessing model. The articlesummarizes the classic pre-processing model based on based on secondary cleaningmodel with dynamic threshold algorithm massive Web log preprocessing model, and it is described in detail.2) The fourth part of the contents of mining research based on the massive dataparallel association rules algorithm, parallel algorithm of Apriori algorithm as a startingpoint, the analysis of the traditional CD and DD algorithm strengths and inadequaciesand suggest improvements program, completed improved Apriori data mining algorithmdescription.In the fifth part of the three or four parts, improved and optimized to do simulationexperiment and analysis based on the experimental results, and came to the conclusionthe experimental results.
Keywords/Search Tags:Cloud computing, Hadoop, Data processing, Apriori algorithm
PDF Full Text Request
Related items