Font Size: a A A

Hadoop-based Network Verification Platform Research

Posted on:2012-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Z M XuFull Text:PDF
GTID:2178330335974317Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Cloud computing, a new concept proposed at the end of 2007, is a revolutionary innovation, because it means computing capacity can also be circulated like commodities, such as gas, electricity and water, and it is convenient to use, with low cost. The main difference with common commodity is that it is transmitted through Internet. So far, Google, IBM, Amazon and other IT giants have launched their own commercial cloud computing platform, and made it one of their major development strategies in the future. Therefore, study of cloud computing not only keeps up with industrial technology trend, but also has great applicable value.There are tens of thousands of servers in the back-end system of cloud computing. How to effectively organize such a large amount of servers is a key problem concerning the effective and stable operation of the cloud computing system. A reasonable network topology can not only improve network performance, but also ensure network stability, with normal operation in case of partial node or link failure or congestion. Network topology of cloud computing back-end system is different from the common ones, so we need to reconsider and do more research about it.Data is the information carrier, while information is the content of data. Thus, data is generally considered the basis of information system. Using computer to process data and extracting information are the basic functions of information system. In today's highly information-oriented society. Web is currently regarded as the largest information system, with characteristics such as massive, diverse, heterogeneous data and dynamic change. How to extract valuable information for enterprise quickly out of the massive data has been the biggest headache for programmer in the process of software development. Based on this, the paper analyzes existing key technologies like distributed storage and computing, combines research on Hadoop cluster technology, business demand and actual hardware and software capabilities, then comes up with a data processing model based on Hadoop. From aspects of data structure design, program procedure organization and use of programming techniques introduce the development approach of the model. The model is applied in web log data pretreatment process in network authentication platform. It allows programmer to obtain resources handling the super huge distributed system, without much experience of parallel processing or handling distributed system. The model could also be used in network applications dealing with large amount of data, such as picture storage, search engine, grid computing and so on.This topic is to combine the model with business applications, to better meet the project demand using cutting-edge distributed frame technology. The model could be deployed to instance, to test the model's practical value through experiment, such as high efficiency, low cost, scalability, easy maintenance and so on. With integration with the original pretreatment system, we have make optimization to the primary model,including: optimization of MapReduce operation scheduling,sorting algorithms and fault-tolerant mechanism of cluster system.
Keywords/Search Tags:distributed data processing, massive data, Hadoop
PDF Full Text Request
Related items