Font Size: a A A

Web Content Filtering Key Technologies And Research

Posted on:2008-10-24Degree:MasterType:Thesis
Country:ChinaCandidate:G S XuFull Text:PDF
GTID:2208360212499626Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With rapid development of network, people are enjoying more and more great advantage provided by network, and on the other hand, they are also greatly suffering from constant security threat accompanied by network itself, such as secret leak-out, virus, and reactionary words, which are covering the network here and there. To ensure network content safe or not, network content filtering system scans content of sessions between users and provides original information for audit.The paper presents an archetypal system for network content filter, which is able to scan and filter network packets with low packer loss. Network Driver Interface Standard (NDIS) is adopt in the module to capture and process network packets, and it provides a base for designing and testing algorithms in Windows kernel. For the packets captured form the date link layer are of little use to analyze the session content between the users , an efficient method is introduced to assemble the TCP/IP packets.Keyword filtering is performed by pattern matching algorithms and becomes the performance bottleneck of network content filtering system actually. Therefore, this paper analyses the existing string matching algorithms including classic single-pattern and multi-pattern matching algorithms, and then compares the algorithmic performance with each other, which makes preparation for the design of more efficient algorithms.The classic matching algorithms are made for English words. Network content is often combined with several languages, Chinese, English and other languages, and its character set is extremely large. Unlike English, there is no more prefix and suffix in Chinese. The paper designs and implements two algorithms for Chinese character set combined with other languages. One is named CE_BM algorithm which is made for FPGA and DSP, and it costs much less memory than CE algorithm but achieves the same performance with bit operations instead of string operations. The other, AWM algorithm, is based on the famous multi-pattern matching algorithm, WM algorithm, and becomes more suitable for Chinese words. Meanwhile, experiments are made on the two algorithms under circumstance of Chinese texts combined with English and other languages. The result of experiments shows that the two algorithms have better performances than their original ones. Finally, the system is tested under the similar circumstance and the experimental result shows that the system has the capability to filter the network content of LAN.
Keywords/Search Tags:Content Filtering, String Matching algorithm, NDIS, TCP/IP Protocols
PDF Full Text Request
Related items