Massive Data Processing Application Based On Hadoop

Posted on:2013-04-28

Degree:Master

Type:Thesis

Country:China

Candidate:N Chen

Full Text:PDF

GTID:2248330371983859

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Faced with explosive growth of Internet data, the traditional stand-aloneapproach has been slowly falling behind, and the new form distributed parallelprocessing has become more sophisticated which will replace the original approach.Nowadays, processing and storage of massive data has become a hot research.Hadoop platform developed by Dong Cutting and others stands out and become anmost important research direction of distributed processing.Hadoop basis distributed architecture is composed of HDFS distributed filesystem and MapReduce computation model. The HDFS is primarily responsible forthe storage of massive data, and MapReduce is mainly responsible for the calculationof the massive data. The traditional log processing generally uses stand-alone shellscript processing, when faceing the massive data processing, this method looksbloated. With the development of the Internet social networking, social-interpersonaltreatment has also become a hot topic of todayâ€™s Internet research. On this basis,improve and distributed the single-source shortest path Dijkstra algorithm, and use itto analyze the relationships in the social network.In this paper, we use MapReduce approach of Hadoop platform instead of thetraditional shell approach to processing the massive log of social networking, andprovide a faster processing speed, more convenient, efficient and humane handlinginterface, as well as more detailed analysis of the functionIn the experimental section, compared to the shell script processing, we build acluster with four machines to verify that hadoop has the advantage in massive logprocessing. At the same time, we use the distributed Dijkstra algorithm analysis of theinterpersonal relationships between strangers.Finally, this paper put forward some configuration parameters optimization method for the Hadoop platform to run operations, these parameters are obtained.

Keywords/Search Tags:

Hadoop, MapReduce, massive log processing, distributed Dijkstra algorithm

PDF Full Text Request

Related items

1	Research On Distributed Processing Of Massive Video Data Based On Hadoop
2	Research Of Massive Data Processing In The Vessel Monitoring System
3	The Management Of Massive Images Data Based On Hadoop
4	The Research Of Job Scheduling Algorithm In Mapreduce-styled Massive Data Processing Platform
5	Website Concurrent Performance Analysis Based On Massive Log Files In Hadoop
6	Research And Implementation Of Small File Processing Techniques In Hadoop
7	Research Of Massive Data Processing Model In CDMA Packet Domain Based On Hadoop
8	Research And Implementation On Incremental Data Processing Algorithm Based On Hadoop
9	The Research And Analysis Of Hadoop Small File Processing Method
10	Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets