Research And Application Of Massive Data Processing Model Based On Hadoop

Posted on:2009-02-24

Degree:Master

Type:Thesis

Country:China

Candidate:Z Zhu

Full Text:PDF

GTID:2178360245970036

Subject:Cryptography

Abstract/Summary:

PDF Full Text Request

Data is the carrier of information, the information content of the data is generally believed that data is the basis of information systems. Using computers to process data, extracting information is the basic function of information systems. In today's highly information-oriented society, the Web can be said to be currently the largest information system, of which the data are massive, diverse, heterogeneous, dynamic change characteristics. How to rapidly extract useful information from the massive data of enterprise has become a daunting problem of programmers in the process of application development.Based on the starting point of this problem, after analyzing the key technologies of existing distributed storage and computing, combined with Hadoop cluster technology research, as well as the business needs and the actual strength of hardware and software, which is based on the massive Hadoop Data-processing model, and from the data structure design, program flow and the use of programming to introduce several aspects of the development of this model. The model is applied to the large-scale web site log data pre-process. We also design an effective model against this model based on distributed pre-process paradigm. The distributed model applies correlation pattern matches to each distributed servers first, and then combine all results of the excavation of servers. This will help to alleviate the congestion on the communications network, reflects parallel computing, asynchronous mining, the advantages of the reduction of heterogeneous data. At the same time, it allows programmers to deal with very large distributed system of resources without any knowledge of parallel programming or the experience of distributed system. In addition to data mining, the model also can be applied in areas such as picture storage, search engines, and grid computing to handle large data network applications.The characteristic of this study is the integration of model research and business applications. Using leading edge distributed technical framework to meet the demand of the project and deploy the model to actual instance. With the experimental results for testing models of practical value, such as high-efficiency, low-cost, scalability, and maintenance and so on. We also perform the performance optimization against the basic model on the basis of the integration with original pre-process system, which includes: the refinement of simplified rules, the configuration of priority of multi-task and optimization of the network load balancing algorithm.

Keywords/Search Tags:

hadoop, massive data, distributed, data pre-process

PDF Full Text Request

Related items

1	The Management Of Massive Images Data Based On Hadoop
2	Research And Application Of Distributed System For Processing Massive Logs
3	Research Of Massive Data Processing In The Vessel Monitoring System
4	Hadoop-based Network Verification Platform Research
5	Design And Implementation Of A Platform For Massive Log Data Analysis Based On Distributed Computation
6	Research On Distributed Processing Of Massive Video Data Based On Hadoop
7	Massive Data Processing Application Based On Hadoop
8	The Design And Implementation Of Massive Data Storage And Calculation Platform Based On Hadoop
9	Research On Hadoop Based Telecom Operators Massive Data Processing Techonology And Its Applications
10	Research And Application Of The Massive Web Data Analysis Based On Hadoop