Font Size: a A A

The Techniques Of Network Information Content Audit

Posted on:2013-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:L ChenFull Text:PDF
GTID:2248330362470873Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the21st century, the wave of popularity of the Internet and mobile communications has sweptover every corner of the world and produces a very profound impact on people’s working, learning,life and so on. Although these communication networks bring a wealth of information, at the sametime, it also brings a new problem. The network provides a convenient for the disclosure of businesssecrets, technical secrets and dissemination of negative information.Leakage of confidential information and dissemination of adverse information exist in theinformation exchanges between the enterprise internal network and external network. In order toagainst the circumstances, this paper designs a content audit system for the network information oflarge-scale enterprise. Researches are carried out mainly from the following aspects around the keytechnologies of the system.First, in order to speed up content audit. We apply a distribute cluster architecture and diversedata traffics by the technology of load balancing. This paper presents a dynamic session-based loadbalancing algorithm which can link the new distribution of the arrival session to the smallest clusterserver load, and takes into account that the audit system need to audit the contents in application layer.The data packets which belong to a session link will be assigned to the same server for processing.These measures are effective in improving the processing speed of contents audit, but also take intoaccount the load of servers in cluster. The algorithm perfectly solves the performance bottlenecks ofprocessing speed in large-scale of enterprise network information exchange and ensures the integrityof audit.Second, in order to high up the precision for content audit. The thesis mainly focuses on textcontent research, gives a deep analysis of text classification technology. The thesis gives an overallanalysis of the technology for text classification, the paper has analyzed and compared the SVMalgorithm, Bayes algorithm and KNN algorithm by the experiments in Chinese text classification,summed up the strengths and weaknesses of three algorithms; In the paper proposes an improvedTF-IDF feature weighting calculation method, taking into account the correlation between thekeywords of each category and keywords of text to be classified. Through the comparison ofexperimental results, the improved feature weight calculation method have the lower of timeefficiency in Chinese text classification, but it can improve the precision of classification, its precisionand recall ratio have higher than the TF-IDF feature weighting calculation method.
Keywords/Search Tags:content audit, packet capture, load balancing, pattern recognition, Chinese textclassification, weight calculation
PDF Full Text Request
Related items