| In the rapid development of digital media technology today, Teaching video resources play a important role on study new imformation technology for us. people attach importance and attention to the construction of Teaching video resources. A rich informative and clear structure shared educational video resource database, Not only can effectively improve the quality of education and teaching, enrich teaching resources, but also because of its unlimited, students can obtain the related disciplines of domestic and foreign advanced information and knowledge whenever and wherever possible.This paper starts from the research status of network crawler,introduces the working principle of the theme crawler, analysis of the topic crawler search strategy and the related algorithm. And it is introduced of the Webpage elimination.This paper chooses heritrix as the tools reptiles, research on related technology of Heritrix crawler, custom Heritrix class Extractor and class FrontierScheduler., in order to achieve the video tutorial topic information. In order to enhance the accuracy rate of crawl, this paper introduced the ELFHASH-xl URL hash algorithm in the process of crawling, it is using multiple threads on the capture process,it is to be improve the capture efficiency.About the local Webpage resources of downloading, this paper makes analysis and detection by jsoup this Webpage analysis software. And using the database to structuring information. This paper uses the Lucene indexing and query on the extraction of information, design the system prototype and test and performance analysis of the system. Finally, the work of the dissertation is summarized, and put forward the system needs further improvement and the direction of next research work. |