Font Size: a A A

The Research Of Implemention And Application For Network Data Analysis System Based On YARN

Posted on:2015-09-13Degree:MasterType:Thesis
Country:ChinaCandidate:C FangFull Text:PDF
GTID:2308330452957130Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Information and Communication Technology, alargequantity of Internet companies have emerged as an alternative solution to change ourconventional lifestyle and production mode. Despite of convenience, a substantial amount ofdata that internet has brought to us has become a critical challenge to a wide variety ofindustries. Specifically, the issue of how to take full advantage of those massive data toimprove Internet service and disclose human behaviors and scientific laws behind the veil ofdata, is an urgent one that needed to be solved. An ultra-capacity data processing platformwith the ability of processing multiple data requests are expected to be the basic requirementof the solution. Therefore, the study, focusing on the data processing platform, can becarried out according to the current mature distributed data processing technology with thecombination of the feature of network data.When dealing with massive data, the conventional database often meets its deficiency instorage capacity and scalability. The ability of effectively utilize resources from multiplecomputers enables a distributed system to be a better alternative of processing massive data.In a distributed system, various computing frameworks are usually being used during theprocess of network data processing. Some of the typical representatives of computingframeworks are MapReduce which is good at off-line operating, Spark which is amemory-based framework and Storm which is a real-time framework, etc. Therefore, inorder to maximize the utilization of cluster resources, multiple computing frameworks thatshare resources of one cluster are required to be assigned to the network data platformdiscussed in this article. To reach this goal, a framework responsible for resourcemanagement should be deployed on clusters, and proper schedulers and schedulingalgorithms should be applied as well. As a result, the centralized resource scheduling andallocating of multiple computing frameworks can be achieved.In this article, YARN is selected as a distributed resource management framework.MapReduce, Storm and Spark are deployed on the base of YARN. According to thecharacteristics of network data analysis, a Capacity Scheduler and a proper schedulingstrategy are chosen to carry out the resource utilization rate test. The result of the testsuggests that YARN is not only a robust supporter for upper computing frameworks, butalso an outstanding manager of scheduling tasks and allocating resources. Moreover, YARNhas an extraordinary performance on both fault tolerance and scalability. At the end of thispaper, it briefly introduces the actual task operation on this network data platform, points outaspects needed to be improved and proposes the key points and targets of future work.
Keywords/Search Tags:Network Data, YARN, Distributed System, Resources managing Framework, Scheduler
PDF Full Text Request
Related items