Font Size: a A A

Research Of Adaptive Text Filtering System Based On Vector Space Model

Posted on:2007-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhuFull Text:PDF
GTID:2178360182997587Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, various information is created with the speed of explosion along with therapid development and popularization of communication networks. Information resource hasalready become a kind of new wealth. But as it brings us much convenience that the informationincreases increasingly, we meet more and more problems: some information such as delict,eroticism,violence,superstition and evil religion etc is harmful to our heart;info overload and soon. Furthermore, the amount of the useless or harmful information much more than what we need,and it brings us much inconvenience. At present, in order to make use of information resourceavailably, how to express the requirement of users accurately and farther screen out theinformation that satisfies the users automatically and filter illegal information and uselessinformation in the large-scale information flow, has already become an important problem in theresearch and development of communication networks.To overcome those problems, the research of information filtering has drawn much attention.Information filtering has been researched for a long time along with the development ofinformation retrieval. It is a process to search the information that satisfies the users, namely aprocess that finds out the information that satisfies the users and filter illegal information anduseless information in the large-scale information flow. Information filtering can be divided intotwo parts, namely text information filtering and non-text information filtering, according to thecontent of the information that is processed.Text information filtering, namely text filtering, is a process that finds out the text thatsatisfies the users from the large-scale text flow, according to the requirement of the users. Textfiltering is divided into two kinds of type by TREC: one is text filtering according to the contentof the text;and the other is text filtering on the basis of the mode of cooperation. As a branch ofinformation filtering, text filtering relates to extensive knowledge, and it colligates a lot ofknowledge in Natural Language Comprehension,Artificial Intelligence and Knowledge Theoryetc. The key technique of text filtering mainly includes participle of text,decreasing dimension oftext eigenvector,feature extraction,initialization of user profile and filtering threshold andmachine learning etc.山东师范大学硕士学位论文This dissertation mainly studies text filtering, and it especially studies the text filtering inInternet. It focuses on the key techniques of the adaptive text filtering system, and discusses a fewaspects as follows Mainly:1. This dissertation summarizes some evaluation measures in text filtering,some evaluationfunctions about importance of text feature item and some methods about user profile learningwhich are used widely at present.2. This dissertation suggests a method of combining both function to evaluate importance offeature item on the basis of analysing and researching expected cross entropy and mutualinformation which are evaluation functions about importance of text feature item and using theirdifferent function. The result of relevant experiment has proved the feasibility of this method.3. This dissertation brings forward a method of constructing filtering profile. The methodimproves the find-maximum-special-supposion algorithm in the methods of concept learning bycombining the need for dealing with the text feature items and constructs filtering profile from afew training texts by using the new algorithm. The result of experiments shows that, comparedwith the method which uses the subject-description as filtering profile straight, this methodimproves the precision of filtering markedly, and it can obtain the satisfying effect.4. This dissertation designes a detailed flow chart to carry out the adaptive text filteringsystem, and analyzes a few key processes of the realization of system. Furthermore, it presents intheory a few relevant methods to resolve the problems,such as participle and decreasingdimension of text eigenvector and so on.
Keywords/Search Tags:Vector Space Model, Adaptive Text Filtering, Text Feature Extraction, User Profile, Filtering Threshold
PDF Full Text Request
Related items