Research Of Building Cloud Computing Platform For Processing And Analyzing Massive Data

Posted on:2012-12-17

Degree:Master

Type:Thesis

Country:China

Candidate:T Xiao

Full Text:PDF

GTID:2268330401985249

Subject:Detection Technology and Automation

Abstract/Summary:

PDF Full Text Request

Nowadays, as the rapid development of the Internet and the growth of Internet people, there is a flood of information to process for those Internet companies that provide network services. They have to analyze the needs of the users and the effects of a variety of products and so on. Often there will be some of the data analysis time requirements. For the real storage space and processing time requirements, the traditional database system has been difficult to meet. The main purpose of this paper is to build a massive low-cost distributed data processing system to store and process the data.As a starting point to this problem, after analyzing the existing distributed computing and storage on the basis of key technologies, combining with Hadoop cloud computing technology research and the actual hardware and software capabilities on campus network, to meet their own needs, this paper presents a model based on cloud computing for the data processing, researches several aspects of this model from the data structure design, system module, program flow and programming platform. Finally, this model is applied to a distributed mass data search engine. The above study indicates that the reliability, efficiency and scalability of the Hadoop cloud computing platform meet the technical requirements of the distributed search engine. This paper uses Hadoop system as the platform for distributed computing application systems. This paper analyzes each step of the crawling, indexing, searching in the traditional search engine process, improves its function modules, and decomposes these non-sequential steps into two sub-tasks:data computing task and data combining task. Meanwhile, it encapsulates all the data computing tasks into the Map function, and the data combining tasks into the Reduce function by using Map/Reduce programming ideas. The main tasks of this paper are deploying the improved search engine system on a Hadoop cloud computing environment which was structured by some inexpensive computers, so that it has fast response, high reliability and scalability.The main characteristic is the integration of the model proposed by the research and practival application of business. Using forefront distributed framework technoloty to better meet the needs of the project and deploy the model to actual distributed environment, to test the system with the experimental results of practical value, such as high efficiency, low cost, scalability, and ease of maintenance and so on.

Keywords/Search Tags:

Massive Data, Hadoop, Search Engine

PDF Full Text Request

Related items

1	Design And Implementation Of Distrbuted Search Engine Based On Hadoop Cloud Platform
2	The Research And Application Of Search Engine Based On Hadoop
3	Research On Key Technologies Of Search Engine Based On Hadoop
4	Mobile Environment Of The Search Engine Software System Design And Implementation
5	Design And Implementation Of Vertical Search Engine Based On Hadoop
6	Research And Implementation Of Distributed Search Engine Based On Hadoop
7	The Research And Design Of Search Engine Based On Distribution
8	The Research And Implementation Of Distributed Search Engine Based On Hadoop
9	The Design And Implementation Of Massive Data Storage And Calculation Platform Based On Hadoop
10	Research And Application Of Massive Data Processing Model Based On Hadoop