Font Size: a A A

Massive Data Processing Application Based On Hadoop

Posted on:2013-04-28Degree:MasterType:Thesis
Country:ChinaCandidate:N ChenFull Text:PDF
GTID:2248330371983859Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Faced with explosive growth of Internet data, the traditional stand-aloneapproach has been slowly falling behind, and the new form distributed parallelprocessing has become more sophisticated which will replace the original approach.Nowadays, processing and storage of massive data has become a hot research.Hadoop platform developed by Dong Cutting and others stands out and become anmost important research direction of distributed processing.Hadoop basis distributed architecture is composed of HDFS distributed filesystem and MapReduce computation model. The HDFS is primarily responsible forthe storage of massive data, and MapReduce is mainly responsible for the calculationof the massive data. The traditional log processing generally uses stand-alone shellscript processing, when faceing the massive data processing, this method looksbloated. With the development of the Internet social networking, social-interpersonaltreatment has also become a hot topic of today’s Internet research. On this basis,improve and distributed the single-source shortest path Dijkstra algorithm, and use itto analyze the relationships in the social network.In this paper, we use MapReduce approach of Hadoop platform instead of thetraditional shell approach to processing the massive log of social networking, andprovide a faster processing speed, more convenient, efficient and humane handlinginterface, as well as more detailed analysis of the functionIn the experimental section, compared to the shell script processing, we build acluster with four machines to verify that hadoop has the advantage in massive logprocessing. At the same time, we use the distributed Dijkstra algorithm analysis of theinterpersonal relationships between strangers.Finally, this paper put forward some configuration parameters optimization method for the Hadoop platform to run operations, these parameters are obtained.
Keywords/Search Tags:Hadoop, MapReduce, massive log processing, distributed Dijkstra algorithm
PDF Full Text Request
Related items