Font Size: a A A

A Semantic-based Algorithm For Information Filtering And Its Applications

Posted on:2007-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:W G ZhouFull Text:PDF
GTID:2178360215950754Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a part of numerous and jumbled information on the Internet, bad information, which is expressed by various forms, would cause harm or disturbance to different public from different angels. So it is important to implement necessary and effective content filtering to Web visiting to construct a healthy and secure network environment. Nevertheless, the traditional algorithms for text information filtering unable to recognize the semantic of text because they only implement the judgement on the level of structure matching, its filtering effect difficult to meet the intellectual requirement.A Semantic-based Algorithm for Information Filtering(SAIF) is put forward by this thesis. SAIF includes some key steps such as Chinese Word Segmentation, the semantic-based frame forming, calculating the similar degree of two semantic-based frameworks etc. Based on a word library which is composed by part-of-speech and context etc., by the Maximal Match Binary Search Fast technology for Chinese Word Segmentation, the object sentence can be divided into a set of words; According to the rule library of syntax and the information of the part of speech and the position in the sentence of every word, the key word (subject, predicate , object) can be distinguished from sentence, then the semantic-based frame can be filled; according to the long distance match function and the formula for calculating the similar degree of two semantic-based frameworks, the value, which can represent the similar degree of two semantic-based frameworks, can be calculated and decide to filtrate or not.SAIF changed the operation for comparing the similar degree of two semantic-based frameworks to mathematic calculation. The results of experimentations prove that the filtering effect of SAIF is higher than traditional algorithms' on the level of semantic match.The proxy server plays an important role in the management of Intranet. A HTTP proxy server is implemented by this thesis, and based on it, the semantic-based content filtering for Web page on the application layer was implemented. Besides, by saving the content of Web pages which are visited successfully by client PC, to the HD of the proxy server, the content recur function is implemented.In order to improve the filter efficiency, according to the fractional filter thinking, filtering those packets which flow across the proxy server according as MAC address on the network layer by using the NDIS-HOOK technique. On the application layer, using the HTTP proxy technique, filtering the response message packet according to key words firstly, then implement the semantic-based content filtering to the packets which including the key word only, and the computational complexity of semantic-based filtering is lessened by this thinking.A Proxy Server with Content Recurrence and Semantic-based Content Filtering (SemanticFR) is implemented. SemanticFR has some functions such as monitoring network flow, packet filtering on the network layer, semantic-based filtering on the application layer, Content recurrence etc.
Keywords/Search Tags:Text Information Filtering, Semantic Frame, Chinese Word Segmentation, Proxy Server, Content Recurrence
PDF Full Text Request
Related items