| With the rapid development of Internet and explosive growth of information on networks, search engine has become an indispensable application for users when they search information through Internet. Search engine is of great values when users search general queries because the information it returned usually covers wide range field and is likely to central to the issues. However, the conventional search engine based on the keywords technology is powerless to the increasing multimedia data, especially videos. Otherwise, labeling the video keywords is time-consuming. Therefore, the content-based video seach engine has been proposed in some published studies these years.To improve the performance of the core modules in the video search engine which include the following four modules:video spider, data preprocessor, inquiry and relevance feedback, this paper studies on the content-based algorithms and the system framework. The main tasks of this paper are as follows:In the video spider module, a distributed vertical video crawler algorithm has been proposed. The module gets URLs and downloads videos from the theme web pages through the topic relevance determine algorithm. Besides, a distributed framework has been designed for parallel working to improve the efficiency.In the procedure of data preprocessing, four kinds of visual feature include HSV Correloram, LBP, Camera Motion and Motion activity have been proposed to describe the content of shot. And two kinds of index algorithm:inverted index and R-Tree, have been studied to improve the performance of the system.In the inquiry module, the criterion to evaluate the inquiry was selected by comparing different feature matching strategies. The paper also studied on the multiple-feature search method by comparing different fusion approaches.At last, in the module of relevance feedback, a new algorithm of online learning by enlarge the training set is proposed to overcome the small size sample problem. Using multi visual features to overcome the problem that single feature can't describe the content exactly. And a new fusion scheme for multiple classifiers which allows the system updating the weight of the classifier based on user's preference to the feature has been proposed. The experiments prove that accuracy of the system is improved after relevance feedback. |