| With the development of multimedia technology, the number of videos on the Internet has been increasing dramatically, which brings a great challenge to the processing and storage of massive videos. Meanwhile, the analysis of the actual video retrieval reveals that most of the videos have one or more than one duplicated versions on the Internet, which are called video copies and are processed according to users’ needs, including adding subtitles, changing the resolution and changing the format. These processed videos are visually similar to the original video so that they inflict serious impact on video storage, video retrieval and video copyright protection. Hence how to realize video copy detection efficiently and accurately in massive videos has currently become a highlight in the related fields.In the aspect of computation, the traditional methods used the Xeon single computer to deal with the videos. However, when processing massive videos, one single computer is not efficient enough and the parallel way based on CPUs is too expensive. Recently Google Hadoop is designed for handling the large-scale data, which has the advantages on high reliability and simple programming. The implementation of video copy detection on the Hadoop system can achieve high-speed and high-efficiency.This thesis mainly introduces the implementation of two kinds of video copy detection algorithms on Hadoop platform. One of the algorithm is based on brightness sequence, and the other is based on TIRI-DCT. This thesis analyzes their corresponding performance and introduces the application of text retrieval on predicting the hot topic and sensitive topic on the Hadoop platform, which provides a basis and reference for potential video retrieval based on the combination of video and text.The main contributions of this thesis are:(1) The video copy detection algorithms based on brightness sequence and TIRI-DCT are implemented on the Hadoop platform. In the video feature extracting stage, we use only the Map function because we only need to generate and save the feature sequences. The Map function calculates the portioned video’s feature, generate the feature sequences, and save them into the HDFS. In the video feature matching stage, the Map function is responsible for calculating the distance between the query video and the videos in the database. The Reduce function is responsible for sorting the distance.(2) This thesis mainly analyzes the robustness, discrimination and time-consuming of the implementation on the Hadoop platform. Firstly, we analyze the robustness from the recall ratio and precision ratio. Secondly, we analyze the speed-up, scale-up and size-up in two stages from the time-consuming. Lastly, we analyze the efficiency in the situation of different number of Map function. This thesis provides reference in the area of the distributed application of the video copy detection, the expansion of the system scale, the expansion of the size of data, and the efficiency of the distributed platform under different numbers of Map function.This thesis studies the implementation of video copy detection on the Hadoop platform in the situation of massive videos. It provides reference for the practicability of the video retrieval. |