Design And Implement Of Video Crawler System Based On Hadoop

Posted on:2014-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:H H Qiu

Full Text:PDF

GTID:2268330422463534

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of the network, In the face of such a massive media workson network,Copyright protection has become an urgent problem.That is an effectivecopyright protection scheme for digital content to track the copy of contents by the use of thecopy detection technology.How to obtain the mass media resources is a difficulty in the copydetection technology.Today,the rapid development of cloud computing presents a greatadvantage in terms of mass data processiong.In view of this,Using the framework of Hadoopto design and implem the video crawler,which is used to collect tested video data set forcopy detection system.This paper mainly studies the hadoop framework such as the calculation model ofMapReduce and the HDFS distributed file system as long as the key technologies ofdistributed crawler.It also discusses the Hadoop frameworkâ€™s advantage in distributedcrawler system,such as the scheme of solving the task scheduling and load balancing and thescheme of how to ensure the stability of the entire crawler system when the child nodesdynamically exit which is a major problem in the distributed crawler. All of them are verycomplex and easy to make mistakes.But that the Hadoop framework solves theseproblems.Hence,a distributed video crawler system is designed base on Hadoop. By usingthe MapReduce computation model to achieve the crawling,analysis,duplicate URLremoval,downloads and other computing tasks. And using the partition for the URL sets toensure that each crawling node load balancing firstly.By using the HDFS distributed filesystem to do the storage for the coordination with the computing model.Finally,do the functionality and performance testing by configuring multiple crawlingnode for the video crawler,the test results demonstrate the feasibility and efficiency ofdistributed crawler based on Hadoop architecture.And put forward the prospects for theinsufficient of the crawler system.

Keywords/Search Tags:

Hadoop, MapReduce, HDFS, VideoCrawler

PDF Full Text Request

Related items

1	The Performance Optimization And Improvement Of MapReduce In Hadoop
2	Working Principle And Applied Research Of MapReduce
3	MapReduce Performance Research And Optimization Based On Block Aggregation
4	Design And Implementation Of Data Processing Platform Based On Hadoop
5	Research And Application Of Telecom Big Data Processing Based On Hadoop
6	Optimization And Application Research Of MapReduce Computing Model Based On Hadoop
7	Research Of Job Scheduling Technology In Hadoop Platform
8	Research And Implementation Of Automatic Text Classification Based On Hadoop
9	Processing Of Small Files Based On HDFS And Optimization And Improvement Of The Performance For Mapreduce Computing Model
10	Research On Distributed Processing Of Massive Video Data Based On Hadoop