Key Technology Research-based The Hadoop Of Massive Data Processing

Posted on:2014-10-06

Degree:Master

Type:Thesis

Country:China

Candidate:B Che

Full Text:PDF

GTID:2268330401964426

Subject:Electronic and communication engineering

Abstract/Summary:

PDF Full Text Request

Internet has experienced since its birth nearly half a century, has penetrated intoevery aspect of our lives. WEB2.0era after the Internet is to the third generation-network service personalization. Lead to a sea of data quantify the personalized theevolution of the Internet, the face of such a sea quantifiable data of traditional singlesuper-server gradually appeared to be inadequate, the processing of vast amounts ofdata is becoming a thorny issue. Generation and processing of huge amounts of data is achallenge and an opportunity, massive data provides a rich source of data for datamining, dig out information more commercial value. Based on the above, the processingof vast amounts of data has become a popular technology for major Internet companiesto invest in research, SMEs are vying to put into this feast.The concept of cloud computing by Google in2006pointed out the direction forthe processing of massive data developed by the Apache Foundation open sourceHadoop open source cloud platform features but also for the majority of researchersbrought the dawn of the low-cost mass data processing. The superior performance oftraditional data processing methods and techniques in the case of a single server, but theface of the cloud platform distributed processing mode no longer adapt. Traditional dataprocessing method for the transformation of distributed computing mode, and on thisbasis algorithm improved performance improvements will have major significance formassive data processing.Firstly, starting from the theory of cloud computing, cloud computing course ofdevelopment of the technology system, and the existing variety of cloud computingplatform to do the analysis and comparison. After the selected Hadoop open sourcecloud platform as the basis of the project research platform, its in-depth analysis anddiscussion. Next, the existing data processing techniques discussed.Article Third of four parts is the core content of the project study, the main content:1) The third part of the massive web log data preprocessing model. The articlesummarizes the classic pre-processing model based on based on secondary cleaningmodel with dynamic threshold algorithm massive Web log preprocessing model, and it is described in detail.2) The fourth part of the contents of mining research based on the massive dataparallel association rules algorithm, parallel algorithm of Apriori algorithm as a startingpoint, the analysis of the traditional CD and DD algorithm strengths and inadequaciesand suggest improvements program, completed improved Apriori data mining algorithmdescription.In the fifth part of the three or four parts, improved and optimized to do simulationexperiment and analysis based on the experimental results, and came to the conclusionthe experimental results.

Keywords/Search Tags:

Cloud computing, Hadoop, Data processing, Apriori algorithm

PDF Full Text Request

Related items

1	Research And Improvement Of Apriori Algorithm Based On Hadoop
2	Research Of Medical Data Processing Technology Based On Cloud Computing
3	Research On Medical Data Processing Technology Based On Hadoop
4	Research On Apriori Algorithm Based On Medical Big Data In Cloud Environment
5	The Improved Apriori Algorithm Based On Hadoop Calculation Model
6	Research On Optimization Of Apriori Algorithm Based On Cloud Computing And Medical Data
7	Research On Optimization And Application Of Association Rules Algorithm Based On Cloud Platform
8	The Study Of The Improvement And Transplantation Of Apriori Algorithm Based On Hadoop
9	Research On The Apriori Algorithms For Meteorological Data Association Rules Analysis Based On Cloud Computing
10	Research Of The Wind Turbines Vibration Data Processing On Hadoop Cloud Platform