Font Size: a A A

Design Of The Harmful Information Retrieval System In Campus-network

Posted on:2012-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q YangFull Text:PDF
GTID:2248330371495658Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In this thesis, designing and implementing an internal search engine based on the research of the core technology in the existing search engine.That is a detection method using active scanning harmful information campus network to monitor harmful information campus network monitoring system.Analysis of the major search engines through research works, studies the function of the network spider technology, principle and search strategy. Using network programming techniques (analysis HTTP1.1agreement, Socket connection), database technologies (JDBC Connection, BOLB field processing) and multi-threading technology,I have designed and implemented a spider part.It can capture the specified a domain or an address web site,then analysis, implementation of htm, html, asp, php, jsp and other types of Web pages. It store web pages to a local database, data storage and supply for the search engines.To achieve Chinese word segmentation algorithm and using reverse maximum matching and the two level hash algorithm through comparative analysis. Implementing Chinese word segment in web pages and setting the index database.Based on J2EE architecture, using database programming, web programming, and database indexing library to achieve a web-based search engine system.Based on the search engine system to establish a harmful information monitoring system,including website scanning, harmful information monitoring, background Smart Starting, harmful thesaurus management, site management, events management, user management and system management modules.It can search all sites or specific site to get the harmful information and sent an email to the administrator for reporting harmful informations.It can play the role of an effective early warning of harmful information; the same time realize the harmful information management, classification of harmful information (politics, sex, violence, etc.); storing major violations in the database; finding those responsible websites administrator.It can improve the level of the campus network security management.
Keywords/Search Tags:search engine, spider, bad information detection, word segmentation
PDF Full Text Request
Related items