Font Size: a A A

Research Of Chinese Text Filtering Based On Combinational Model

Posted on:2007-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Q XuFull Text:PDF
GTID:2178360182989257Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Information filtering is an important research issue in natural language processing. In recent years, filtering system is widely used in all kinds of applications. They are varying in technology, but all share the goal of automatically directing the most valuable information to users in accordance with their user model, and helping them spend limited reading time most optimally.Text filtering becomes the focus of information filtering research, for the mass of on-line information is in the form of text. Nowadays, there are two typical approaches for text filtering: content-based and collaborative. Content-based filtering characterizes the contents of the documents and the information needs of potential message recipients, and then using these representations to intelligently match messages to recipients based on content. Collaborative filtering automates the process of human recommendations. A data item is recommended to a user on the basis of its being relevant to other users having similar tastes. These two approaches are different in some characteristics. Content-based filtering method is easy to implement, but hard to distinguish text's qualities and discover user's new interests, while collaborative filtering overcome these shortcomings and take coldstart problem as its own disadvantage.In this paper, we propose a text filtering method based on a combination model, and pay much attention to the research of content-based match algorithm and collaborative recommendation algorithm. The main works are as follows:1. Discuss the text representation methods based on varying window sizes. Compare all kinds of weighting schemes and matching algorithms, where semantic analysis is used to give clues of different subject in the same text.2. Improve traditional nearest neighborhood algorithm. It predicts item ratings that users have not rated by the similarity of items, and then uses a new similarity measure tofind the target users' neighbors. It's effective with extreme sparsity of user rating data.3. Propose a text filtering strategy based on a combination model. It uses contend-based match method to specify a recommendation candidate set, from which the TopN recommendation is finally produced.
Keywords/Search Tags:Text Filtering, Content-based Filtering, Collaborative Filtering, Text Feature Representation, Nearest Neighborhood, TopN recommendation
PDF Full Text Request
Related items