Font Size: a A A

The Design And Implementation Of Vertical Search Engine On Security Vulnerabilities Based On Nutch

Posted on:2018-05-18Degree:MasterType:Thesis
Country:ChinaCandidate:H CaoFull Text:PDF
GTID:2348330518495330Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Today's society, more and more people through the Internet access to information resources, and in the face of a huge number of network information, people need through search engines to quickly retrieved the required information.The traditional search engine technology is to crawl the Internet resources, search range is wide, but in the search results contain a large amount of information, users don't need the user experience.And vertical search engines retrieve only the users concerned,information about a particular field, the search range is small, but the search results more accurate and can meet the demand of users for a particular field of information retrieval.At present, people's learning and so on various aspects of life is inseparable from the Internet, and personal and enterprise information,the Internet safety problems aroused people's more and more attention.And a lot of Internet security vulnerabilities is the important reason for the network security threat, enterprise large-scale ddos attacks to crash the host, users' personal information leakage problems are caused by security breaches.Security vulnerabilities in the risk is big, in order to allow people to understand the latest information security holes, it is necessary to build a vertical search engine can retrieve information security holes.This article through to the vertical search engine related technologies,and the study of the Nutch, open source search engine framework was designed and implemented based on Nutch vertical search engine system security vulnerabilities.The main functional modules of the system including the web crawler, specific topic information filtering, indexing,retrieval and the third party in Chinese word segmentation.In this paper,the main work includes the following aspects:1. familiar with the development of search engine, and the research status of vertical search engine, the emphasis was on the various modules of the vertical search engine technology, at the same time, familiar with the working principle of open source framework of Nutch and plug-in mechanism.2. focuses on the theme of the vertical search engine filtration module, this paper introduces the classification of classifier thought realize the information, so as to realize the search for specific domain information.Due to the simple bayesian classifier is conditional independence natural defects, this paper mainly studied the second-order AODE classifier, and on the basis of the improvement achieved based on the weighted attribute variables and class variables mutual information WAODE classification algorithm.At the same time the WAODE classification algorithm combined with Nutch plug-in mechanism to realize the topic of this article filter module.3. sorting algorithm improved the Nutch retrieval model, from the content relevance and hyperlink analysis web pages authority and time factor into consideration, to get a new web page sorting grading model and experimental verification.4. add a third party in Nutch mmseg4j Chinese word segmentation,to achieve the function of Chinese word segmentation.
Keywords/Search Tags:nutch, vertical search engine, waode classifier, retrieve the sorting, chinese word segmentation
PDF Full Text Request
Related items