Font Size: a A A

Spam Filter Research And Design Based On Natural Language And Domain Ontology

Posted on:2008-10-06Degree:MasterType:Thesis
Country:ChinaCandidate:B ZhangFull Text:PDF
GTID:2178360212990385Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The spam is called unsolicited commercial mail (UCM), and has brought huge losing to the building and commercial activities of our country. Though several spam filters have been published, but after analyzing the principle of these spam filters we found that they exist the problem of semantic absence. When the spam will develop to some further degrees, the current spam filter algorithms will be difficult to work.To solve the problem of semantic absence on the spam filters dealing with the email contents, this paper introduces the methods on the natural language understanding into email judgments. So it makes the spam filters filter or classify the spam at the semantic level. Therefore, we can attain the goal of reducing the artificial work on email classifying. Moreover, the paper designs the content analysis method based on concept analysis.By utilizing the characters of the advertisement language the paper constructs the advertisement domain ontology as the knowledge base of concept analysis. The designing thinking mainly is: the definition of Chinese and language instances are recorded into ontology at first, so the database level is omitted, and constructing the system becomes more convenient. The extendable markup language (XML) is used as the ontology constructing language, and the basis for the extending of the system is established. Secondly, description logic is used as the basis for natural language understanding based on concept analysis. And utilizing the character that description logic holds out hierarchical designing the spam domain ontology which based on concept analysis and hierarchy is established.Finally, according to the thinking above, this paper designs an anti-spam filter which is based on natural language understanding and domain ontology, and brings forward the syntax analysis algorithm and semantic analysis algorithm according with the real world. The advertisement spams are used as the test cases to test the anti-spam filter, and the satisfying testing result is gotten.
Keywords/Search Tags:spam filter, Chinese email, natural language understanding, semantics analysis, concept analysis, Ontology
PDF Full Text Request
Related items