Font Size: a A A

Web Video Web Crawler Oriented Research And Implementation

Posted on:2013-12-03Degree:MasterType:Thesis
Country:ChinaCandidate:S ChenFull Text:PDF
GTID:2248330374485922Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The enhancement of computer hardware process capability and the increasednetwork bandwidth make people watch video on line become reality. Web video hasdisplaced the traditional TV as people’s first choice of watching video. Most of theexisting search engines provide keyword-based search, but for the video data with awealth of information, it’s difficult for users to summarize the characteristics of thevideo accurately. There are some subjective factors in describing by words. A largenumber of useless information will be searched in this way. It’s inefficient. Therefore,we need a more intuitive way to search for videos. Content-based web video searchengine is in such demand.Content-based video retrieval is the technology that through the video shotdetection and extracting the key frames from shots, retrieving it using the videofeatures. Web crawler is the basis for constructing Web content-based video searchengine. At first, thousands video data online should be collected by crawler, then thesearch engine analyze the video content and indexing.The purpose of this thesis is to realize content-based video search engine. Thethesis studies the related technologies with Heritrix clawer in-depth,and studies thestreaming technology and packet capture technology. Many video websites hide thetrue address of the video, by looking up the address bar or resolve HTML text wecannot find the download link of the video. So, by analyzing data packets betweenvideo server and the local network card, a method of getting web video downloadaddress is given in this thesis. The function of downloading web video is add inHeritrix.At first, the thesis describes the overall design of the web video search enginesystem, and then introduces the Video information acquisition module, the videoprocessing module, video classification module and video retrieval module. Bycapturing and analyzing network data, we achieve the purpose of getting the videodownload address. According to the needs of video retrieval, we finish the Chineseword segmentation and video standardization. When the crawler crawl a single website, it can not give full play to the advantage of multi-thread. This thesis improves the URLassignment strategy; it improves the efficiency of the crawler. Finally, the crawler andthe whole system are tested.
Keywords/Search Tags:web video retrieval, crawler, video address resolution
PDF Full Text Request
Related items