Font Size: a A A

The Research And Implementation Of One Kind Of Web Page Filtering Method Based On Real-Time Network Traffic Data

Posted on:2014-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:Z WeiFull Text:PDF
GTID:2248330398472018Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet the web application has become the major channels which the information releases and diffusion. Networks provide people with a wealth of information what is violence illegal and unhealthy. So the content of information is changed, the huge stunt that include pornography violence and other bad information let minors from the network can’t themselves and lost their life ethics and values which seriously affected the social stability and unity. Therefore we must ensure the purity and safety of the network. The business based on the needs arises at historic moment. The traditional web page filtering mode include the method based on URL list filtering, the method based on keyword filtering, and the method based on the model, although they have their own advantages, they have their defects also. Based on this, this paper puts forward a kind of page filtering method based on real-time network traffic data, combined the web page classification process which integrate the SVM and KNN classification algorithm with the characteristics of the URL list filtering method, to design the system structure. Through the monitoring, capture and restructure of the real-time HTTP message, the HTML page which the user requests can be get, after page parser, text classification process, the text classification forecast information can be get, according to the blocking strategy and the information that capture structure the RST message, realize the connection blocking, and at the same time, storing the forecast information, when the system capture the URL which the corresponding page has been processed, you can make the action immediately.This paper mainly completed the following work:First, reviewed the key technology what is used for system design and implementation, and summarized the current research status of key technology, such as the real-time capture page, page analytic, text classification and bypass block.Second, study the key techniques what is used for system design, including page capture process monitoring, message extraction, reorganization, page analytic, text classification process of stop words removal, feature selection, text classification algorithm and bypass blocking technology, and improve the analytical and text classification algorithm.Third, this paper make the detailed demand analysis, and make the overall system architecture design according to the needs, complete the divide of the system function module, and make the detailed design based on the functional requirements. Last, according to the research and design of the system, make the coding of the system, and make the function and performance testing. Complete the coding to implement the respectively modules, and complete the system unit testing and integration testing, and make the optimization of the system.The paper summary the related technical of current web page filtering method, and proposed web filtering method based on real-time traffic data. Through the use of the raw socket realize the monitoring of network data flow by the network card which was set the hybrid mode was connected the SPAN mirror port, take the last16bits of the destination IP address as key value of hash table, to realize quick location of the message in the process of restructuring, complete the web page source code analysis by the principle of matching near based on the stack data, and load the top vocabulary into hash table, to realize the high efficiency of removing stop words, integrate the SVM and KNN text classification algorithm, combined with the URL list filtering method of characteristics, storing the information what the system processed, ensure that the system tack quick response action when capture the page again.
Keywords/Search Tags:Page filtering, data for restructuring, page analysis, text classification, bypass block, URL list filtering
PDF Full Text Request
Related items