Font Size: a A A

Information Filtering Systems Based On Web Text Content And Design,

Posted on:2005-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q LiuFull Text:PDF
GTID:2208360125454212Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, the scale of Internet is increasing at a fastest speed. As we get useful information from Internet, we meet more and more problems: info overload, info lost, info porn, and info violence. To overcome these problems, the research of Information Filtering has drawn much attention. Chinese Text Filtering is a branch of Chinese Information Processing Research. It searches the useful info and eliminates the useless or irrelevant info in the dynamic data stream according to users' request.In this thesis we designed a Content-based Chinese Text Filtering System. This system integrates the statistic approach with the text pattern matching approach.The work including:Text feature extraction is the essential operation for text filtering. We extract the keywords from the training text collection to form a dictionary. We use this dictionary to extract the word from a testing text, make the speed of extract more quickly.The representation of texts is a major difficulty in text fi Itering. We use vector space model to present the text. After building a new matching mechanism based on k- nearest neighbor algorithm, we use it to hold up those illegal texts, such as porn. Finally, we use the approach of relevance feedback in order to improve the text filtering.
Keywords/Search Tags:Information Filtering, Text Filtering, Text Feature Extraction, Vector Space Model, Text Classification, K-Nearest Neighbor, Relevance Feedback
PDF Full Text Request
Related items