The Application Of Rough-Set-Model Based Text Clustering Algorithm In The Text Filtering

Posted on:2005-11-23

Degree:Master

Type:Thesis

Country:China

Candidate:B Gu

Full Text:PDF

GTID:2168360122488697

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Information filtering is a unique and active information service mechanism, a useful supplement to the traditional information retrieval service. Clustering makes a classification to a set of subjects of question space, and put the similar subjects into a category, which makes the average distance between the subjects within one category as minimum as possible, and while makes the distance between clusters as maximum. The application of clustering into information filtering, to a certain degree, promotes the filtering efficiency of the system, and plays an active role in the examination of the precision and recall of the text.The indeterminacy and vagueness of natural language cause difficulty to NLP. The rough set is capable of describing the vague concepts, and measuring to the extent of vagueness, so it is appropriate to describe the natural language through the rough set. With the rough set theory as background, this paper has studied deeply the rough set representation model and the clustering based on this model. The main innovation and work of this paper are as follows.(1) puts forward a new text representation model, which originates from the theory of equivalence division of the rough set, defines the similitude of this model, and proposes the approach to calculate the text similitude of this model.(2) puts the text clustering techniques into the practice of information filtering. After clustering of the documents, in the process of retrieval, we make a comparison between the retrieval words the users point out and cluster center of the documents, and as a result, achieve a cluster that is most similar to retrieval words. Through the calculation of both the selected documents and those retrieval words, thence the retrieval range will be reduced, the efficiency of retrieval be increased, and the retrieval deviation be overcome to a certain extent.(3) puts the text clustering techniques into the practice of information filtering. In virtue of the cooperation filtering theory, this paper no longer look on the user as separate, but a group of people whose interests are in common in some aspect. Besides, it makes cluster to the user profile, so that the separate user profile will no longer be taken as the calculation subject when the documents are sent out, but the user classified in terms of their interest, which can be used as the recommended subject when the documents are sent out in order to promote the filtering efficiency and precision.The results of the experiment demonstrate the current information filtering system based on the rough set clustering is more efficient than the previous ones in light of its operation.

Keywords/Search Tags:

Information Retrieval, User Profile, Text Clustering, Vector Space Modal, Rough Set

PDF Full Text Request

Related items

1	Application And Research Of Information Retrieval Algorithm In Web
2	Research On Intelligent Information Retrieval Based On Rough Set Theory
3	Research On Web Text Clustering And Retrieval Technology
4	Data Mining Research In Web Information Retrieval And Classification
5	Research Of Text Mining Based On Rough Set Theory
6	Research On Rough Set Theory In Knowledge Discovery
7	Research On Text Mining Based Web Information Retrieval
8	Application Of Rough Set Theory In Chinese Text Categorization
9	Research Of Web-Based Personalized Information Search System
10	The Research Of Personalized User's Profile Based On Web Mining