Font Size: a A A

Vulnerability Vertical Search Engine Based On Nutch

Posted on:2012-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:F L LiuFull Text:PDF
GTID:2178330335960393Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the computer and the internet technology, the Internet has become an important information infrastructure. At the same time, the Internet, as an operating system and public environment, may face and hide many security threats which become more and more complex, more and more serious. Security vulnerability is a major cause of network security threats. Network intrusion, a large-scale worm propagation, the system issues such as denial of service are mostly caused by security vulnerability. To reduce the risk of security vulnerabilities, increase the warning about vulnerability threat, promote the ability of manage and control vulnerability, many national security agencies and organizations have established a network security vulnerabilities database. But these various databases don't contain comprehensive vulnerability and comprehensive description of the vulnerability.In this article, a vertical Vulnerability Database search engine is proposed to integrate the various vulnerability databases, which takes the huge vulnerability information, incomplete vulnerability information, incomplete description of the vulnerability into account. In order to achieve a vulnerability vertical search engine, based on open-source Nutch framework, The overall framework of the system processes is designed and each module of the overall framework, such as crawling module, the index module, search module, Chinese word segmentation module, is studied. In this search engine, the initial URL set by owners in the crawling module is established and Web information with breadth-first traversal is accessed to improve the crawling efficiency of search engine. Because the results sort is highly required, the vector model is combined with the link analysis method, which increases the vulnerability information field and sets the document boost, to a fair and reasonable results. IK_CAnaylzer Chinese parser is used in Chinese word segmentation module, because IK_CAnaylzer is based on a dictionary word segmentation which realizes the full cut and maximizes matching of the positive and negative points. User interface is realized by JSP technology to complete dynamic web page generation. Cached page for the convenience of users is also provided in this search engine. Several significant results are achieved as follows:(1) a suitable and fair sorting algorithm is proposed for searching result using the Vertical Search Engine characteristics that are absorption, specificity, deep and large information in vulnerability title and description. (2) The vertical search engine framework and Nutch work flow are deeply studied, the initial URL set is established by owners in crawling system, Nutch plug-in mechanisms are used to achieve text analysis, index systems and search systems. Thus, the vulnerability vertical search Engine based on Nutch is implemented. (3) Compared to Google, Baidu and other general search engines, vertical search engine vulnerability reduce the number of search results, but have a distinct advantage in the accuracy of the results and relevance ranking. The less number of search results To a large extent reduce the retrieval time.
Keywords/Search Tags:vertical search engine, vulnerability, pagerank, nutch, plugin
PDF Full Text Request
Related items