Font Size: a A A

The Application Of Rough-Set-Model Based Text Clustering Algorithm In The Text Filtering

Posted on:2005-11-23Degree:MasterType:Thesis
Country:ChinaCandidate:B GuFull Text:PDF
GTID:2168360122488697Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Information filtering is a unique and active information service mechanism, a useful supplement to the traditional information retrieval service. Clustering makes a classification to a set of subjects of question space, and put the similar subjects into a category, which makes the average distance between the subjects within one category as minimum as possible, and while makes the distance between clusters as maximum. The application of clustering into information filtering, to a certain degree, promotes the filtering efficiency of the system, and plays an active role in the examination of the precision and recall of the text.The indeterminacy and vagueness of natural language cause difficulty to NLP. The rough set is capable of describing the vague concepts, and measuring to the extent of vagueness, so it is appropriate to describe the natural language through the rough set. With the rough set theory as background, this paper has studied deeply the rough set representation model and the clustering based on this model. The main innovation and work of this paper are as follows.(1) puts forward a new text representation model, which originates from the theory of equivalence division of the rough set, defines the similitude of this model, and proposes the approach to calculate the text similitude of this model.(2) puts the text clustering techniques into the practice of information filtering. After clustering of the documents, in the process of retrieval, we make a comparison between the retrieval words the users point out and cluster center of the documents, and as a result, achieve a cluster that is most similar to retrieval words. Through the calculation of both the selected documents and those retrieval words, thence the retrieval range will be reduced, the efficiency of retrieval be increased, and the retrieval deviation be overcome to a certain extent.(3) puts the text clustering techniques into the practice of information filtering. In virtue of the cooperation filtering theory, this paper no longer look on the user as separate, but a group of people whose interests are in common in some aspect. Besides, it makes cluster to the user profile, so that the separate user profile will no longer be taken as the calculation subject when the documents are sent out, but the user classified in terms of their interest, which can be used as the recommended subject when the documents are sent out in order to promote the filtering efficiency and precision.The results of the experiment demonstrate the current information filtering system based on the rough set clustering is more efficient than the previous ones in light of its operation.
Keywords/Search Tags:Information Retrieval, User Profile, Text Clustering, Vector Space Modal, Rough Set
PDF Full Text Request
Related items