Font Size: a A A

Research And Implementation Of Semantic Filtering Of Xml Engine Security Gateway

Posted on:2010-08-23Degree:MasterType:Thesis
Country:ChinaCandidate:H J WuFull Text:PDF
GTID:2208360275983612Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Among the large quantity of complicated Internet information, some ill pieces have bad effects on many people in several different ways and from kinds of aspects. Therefore, necessary and effective filtrating for visiting network is an important aspect of setting up a healthy and safe network environment. However, the traditional methods of text message filter can only judge the layers according to the structure, but not the semantic of the text, which are hard to meet the needs of the intelligentialization.by combinating computational linguistics susbject konwledge, this article proposed and implemented a emantic analysis of filtering methods. For the long text message, that can not be filtered out by keword matching,we can do a better identification and processing through the semantic analysis,so as to ffectively prevent a large number of non-meaning infromation spreaded out.The advanced point of this thesis is mentioned as following: First, aiming at the problems of some word segmentation methods, the concept of intellective dictionary of auto-study protocol is improved, and the basic model of intellective dictionary is archived. This model archives the auto-study function of new words without human being interrupting, and realizes the intellective quality of system. This word segmentation algorithm combines the positive and negative direction max matching, which improves the accuracy of word segmentation. Meanwhile, according to the words frequency library, the algorithm can remove the different meanings of word segmentation, which ensures the accuracy of word segmentation. Second, through the research of the characteristic value algorithm deep, the distilling algorithm of characteristic value based on TFIDF, which imports word property coefficient to improve the characteristic set based on the stability the TFIDF. This algorithm uses the method of latent semantic label to help user analyze the semantic relationship, which multiplies different word property coefficient for different word characteristic. The advantage is highlighting the ability of special position expressing the sort of document, in order to relief the workload of word segmentation, and improve the speed of effective of treatment. Third, through the research of several main categorizer algorithm, based on Bayes algorithm, which has high quality and low complexity, aiming at the characteristic of big batches, fast speed and few sorts of projects, a set of Classifier models of Bayes algorithm is introduced, which uses the word characteristic coefficient and statistic method to sort for the relative degree. The experiment shows that, this categorizer algorithm has the ability of high comprehensive and exact search, which support effective guarantee for the filter quality of all the semantic filter module.The result of the thesis research has already been used in the XML Engine safe gateway, which is the technology project of Guangdong, with national support. Adding the semantic filter module to the whole XML Engine, prevents the intellective filtrating of quantity of bad information, and assures the safe quality of XML Engine.
Keywords/Search Tags:semantic filtering, XML Engine, word segmentation, text Classifier
PDF Full Text Request
Related items