Font Size: a A A

Research Of Building Cloud Computing Platform For Processing And Analyzing Massive Data

Posted on:2012-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:T XiaoFull Text:PDF
GTID:2268330401985249Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
Nowadays, as the rapid development of the Internet and the growth of Internet people, there is a flood of information to process for those Internet companies that provide network services. They have to analyze the needs of the users and the effects of a variety of products and so on. Often there will be some of the data analysis time requirements. For the real storage space and processing time requirements, the traditional database system has been difficult to meet. The main purpose of this paper is to build a massive low-cost distributed data processing system to store and process the data.As a starting point to this problem, after analyzing the existing distributed computing and storage on the basis of key technologies, combining with Hadoop cloud computing technology research and the actual hardware and software capabilities on campus network, to meet their own needs, this paper presents a model based on cloud computing for the data processing, researches several aspects of this model from the data structure design, system module, program flow and programming platform. Finally, this model is applied to a distributed mass data search engine. The above study indicates that the reliability, efficiency and scalability of the Hadoop cloud computing platform meet the technical requirements of the distributed search engine. This paper uses Hadoop system as the platform for distributed computing application systems. This paper analyzes each step of the crawling, indexing, searching in the traditional search engine process, improves its function modules, and decomposes these non-sequential steps into two sub-tasks:data computing task and data combining task. Meanwhile, it encapsulates all the data computing tasks into the Map function, and the data combining tasks into the Reduce function by using Map/Reduce programming ideas. The main tasks of this paper are deploying the improved search engine system on a Hadoop cloud computing environment which was structured by some inexpensive computers, so that it has fast response, high reliability and scalability.The main characteristic is the integration of the model proposed by the research and practival application of business. Using forefront distributed framework technoloty to better meet the needs of the project and deploy the model to actual distributed environment, to test the system with the experimental results of practical value, such as high efficiency, low cost, scalability, and ease of maintenance and so on.
Keywords/Search Tags:Massive Data, Hadoop, Search Engine
PDF Full Text Request
Related items