Font Size: a A A

Research And Implementation Of Web Information Detecting System Based On Topic Strategy

Posted on:2012-07-18Degree:MasterType:Thesis
Country:ChinaCandidate:X Y LiuFull Text:PDF
GTID:2178330338996834Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In recent years, as the rapid development of the internet technology and its continuously expanding popularity, the Internet has gradually merged into our daily life and even become an important part of our life. However, with the free and open structure of the Internet, the information on the network becomes increasingly complex, even some bad information such as violence, pornography, reactionary information spreads in the Internet as Internet is more and more frequently applied. The spread of the bad information, especially the sensitive information related with national security and social stability is extremely harmful to our society. Therefore, how to detect and identify such information from a large number of network information is a significant research subject in the Internet security realm.Currently, a great of researches in this area focus on how to filter and shield the bad information through the gateway, port and user client. However, the way based on Web page to filter and shield the bad information through user client doesn't work efficiently, and the way based on gateway and port has two shortcomings: the optical splitters and port image are needed in this way, and the huge data will be intercepted, so high hardware is required and it is very expensive to apply this system. Therefore, a simple and high-efficient detecting system is required. A Web information detecting system based on topic strategy is put forward in this paper and the primary work and achievements in this paper are described as follows:Firstly, a Web information monitoring system model based on topic strategy is proposed by researching on the Web page and network crawling technology. Based on topic crawling technology, this model puts forward the basic structure of the information-detecting system based on topic strategy through research and analysis on the specific application requirement for the Web information detecting system in our practical work.Secondly, according to the needs to study the system, a heuristic algorithm based on the topic crawling, which is the basis and core of the detecting system in this paper, is presented by the research and analysis on the topic network crawling algorithm. According to the characteristics of the general topic crawling strategy, by introducing the page radiation space, combining the methods based on the link analysis and the content analysis of the topic methods and embedding heuristic algorithm, a kind of topic crawling algorithm based on heuristic algorithm is put forward. The experiment result shows that this kind of algorithm is more efficient than the usual algorithms.Thirdly, in order to combine the research with the practice, the prototype system of the Web information detecting system based on the topic strategy is realized in the thesis, and also validated by experiment research and actual deployment in campus network. The result shows that it can find the pages which contained the certain topics effectively and the system can run stably for a long time.At the end of this paper, a summary of the whole paper is given and some new measures are envisaged.
Keywords/Search Tags:Information Detecting, Crawler, Topic Strategy, Heuristic Topic Crawling Algorithm
PDF Full Text Request
Related items