The rapid development of the network information technology makes the Internet become an important resource platform for global information transmission and sharing. With the increasing amount of data, it’s more and more difficult for the users to find the resources they really need by using the traditional search engine in mass network resources. It is urgent for people to find a query way that can search the information in the professional field precisely. Meanwhile, with the development of network technology, the network environment is quite complex and information security problems have become more severe. In this context, it is very important to design a specialized vertical search engine for the field of information security.The main contents are as follows:1. By studying the origin, features and principle of vertical search engines, we analyzed the system architecture of the open source web crawler Heritrix. On this basis, we realized the purpose of efficient, multi-threaded crawling to the specific web resources by extending the crawler’s parsers.2. By analyzing the Lucene system architecture in depth, we pointed out the deficiency of the original Lucene sorting algorithm.By introducing PageRank algorithm based on link analysis, we improved the Lucene original sorting algorithm to make the results more accurate.3. On the basis of these studies, we designed each subsystem, and finally built an information security oriented vertical search engine prototype system by using the improved crawler and sorting algorithm. |