Font Size: a A A

The Application Of RS Theory In Categorization Algorithms Of Texts Mining

Posted on:2004-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:H P WangFull Text:PDF
GTID:2168360092496683Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
At present, traditional information retrieval isn' t appropriate for disposing large amounts of text data. The more effective retrieval algorithms are needed by users to dispose documents by their importance or relevance, to compare their classes, or to find the model and trend of multi-documents. So text data mining (text DM) is gradually a popular and important research project.The common categorizing technologies of text DM include automatic text categorizing, automatic text clustering, text summarizing and text relevance analysing, etc. Among them, automatic text categorizing is an important technology. It can sort web documents, and distribute each document to a categorizing set. This will confine the seeking range of documents. Also it can be used to organize the retrieval results of searching engines. It sharply reduces the amount of texts by dividing them to classes of specific topics. So it is convenient for users to care only about the relevant sets.In this paper, algorithms of automatic text categorization are studied. The main work is as follows:1. Describing the Vector Space Model of texts, analysing and comparing several algorithms of text categorization;2. Emphatically studying how to extract the rules of text categorization by the knowledge reduction of RS theory;3. Developing a new designation of search engines based on text categorization done by rules.For text classification based on RS theory, a decision table is created with the weights of text characteristic terms discretized as the rules' condition attributes and the classes of texts as decision attributes. Then, the rules of text categorization are extracted by knowledge reduction of RS. The rules extracted by this method are easy to understood. The accuracy and speed is high.Designation of a search engine based on automatic text categorization is presented in this paper. Because the retrieval documents' list is usually too huge, it is inconvenient for the users to seek relevant documents one by one. A text classifier is put between the retrieval interface and retrieving engine. It automatically assorts the retrieval results on-line to divide them to classes of specific topics. So it is convenient for users to find the matching documents to their inquiry.At the end of this paper, contents of this paper are summarized, and the orientation of text data mining is proposed.
Keywords/Search Tags:Texts Mining, Text Categorization, Rough Sets, Support Vector Machine, K-Nearest Neighbour
PDF Full Text Request
Related items