Font Size: a A A

Research And Application Of Massive Data Processing Model Based On Hadoop

Posted on:2009-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhuFull Text:PDF
GTID:2178360245970036Subject:Cryptography
Abstract/Summary:PDF Full Text Request
Data is the carrier of information, the information content of the data is generally believed that data is the basis of information systems. Using computers to process data, extracting information is the basic function of information systems. In today's highly information-oriented society, the Web can be said to be currently the largest information system, of which the data are massive, diverse, heterogeneous, dynamic change characteristics. How to rapidly extract useful information from the massive data of enterprise has become a daunting problem of programmers in the process of application development.Based on the starting point of this problem, after analyzing the key technologies of existing distributed storage and computing, combined with Hadoop cluster technology research, as well as the business needs and the actual strength of hardware and software, which is based on the massive Hadoop Data-processing model, and from the data structure design, program flow and the use of programming to introduce several aspects of the development of this model. The model is applied to the large-scale web site log data pre-process. We also design an effective model against this model based on distributed pre-process paradigm. The distributed model applies correlation pattern matches to each distributed servers first, and then combine all results of the excavation of servers. This will help to alleviate the congestion on the communications network, reflects parallel computing, asynchronous mining, the advantages of the reduction of heterogeneous data. At the same time, it allows programmers to deal with very large distributed system of resources without any knowledge of parallel programming or the experience of distributed system. In addition to data mining, the model also can be applied in areas such as picture storage, search engines, and grid computing to handle large data network applications.The characteristic of this study is the integration of model research and business applications. Using leading edge distributed technical framework to meet the demand of the project and deploy the model to actual instance. With the experimental results for testing models of practical value, such as high-efficiency, low-cost, scalability, and maintenance and so on. We also perform the performance optimization against the basic model on the basis of the integration with original pre-process system, which includes: the refinement of simplified rules, the configuration of priority of multi-task and optimization of the network load balancing algorithm.
Keywords/Search Tags:hadoop, massive data, distributed, data pre-process
PDF Full Text Request
Related items