Font Size: a A A

Research Of Text Mining Based On Rough Set Theory

Posted on:2004-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:D LiFull Text:PDF
GTID:2168360095953807Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Rough Set theory is a soft compulation tool to solve the inaccurate and uncertain information proposed by Z. Pawlak in 1980's, while with the rapid development of Internet and electronic books, text mining is an important research field. This article carries on in-depth research to text mining based on rough set theory.In text classification, we advance an automatic text classification way based on clustering and Rough Set Theory. For clustering is apt to classify the existed documents, Rough Set, by reducing the data, can get a few useful rules, which can improve the efficiency of the classification of new documents. Both theories are combined to classify the documents by unsupervised learning and discuss the method in which new rules, applied to new unclassified documents, can be formed after classifying the training documents.In text retrieval, we introduce an optimized method in text retrieval based on Rough Set theory and Fuzzy Set theory. To be exact, if we combine the Rough Set theory with Fuzzy Set theory, optimize the users' queries of synonym and homoionym and then return the query results in the descending of similarity of the documents and queries, the users can get the most relevant query results as long as they define their queries according to their interests and describe their interest weight of every keyword in their queries in details. If they have more time, they can get other less relevant documents.In this article, we do experiments to prove their validity of applying to the text classification and text retrieval.
Keywords/Search Tags:text mining, text classification, text retrieval, Rough Set Theory, Fuzzy Set Theory, clustering, textual feature extraction, user's interest, query optimization
PDF Full Text Request
Related items