Font Size: a A A

Research Of Text Mining Based On Semantic Analysis

Posted on:2013-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y LuFull Text:PDF
GTID:2218330371961764Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As the information technology develops continually, the text stored in the form of the electron spread everywhere, most information the people needed is often overwhelmed by a lot of spam. It would be an important issue to automatically mining the texts to get useful information. As one of the most used language, the importance of text mining goes without saying. The general process for text classification is divided into several steps in text mining: the training data selection, text representation, feature extraction, text classifier generation and classifier performance validation. Traditional text mining based on statistics results the word frequency as the main factor for text classification, regardless of the word order, word meaning and more semantic information. The term weighting method also belong to unsupervised learning methods, it doesn't make full use of the training data which has the class label.Compared to the text mining based on statistical method, the method based on semantic analysis can consider more text message, thus it can greatly improve the performance of text mining.Semantic analysis mianly focus on the step of text feature extraction, it has a great help in text mining to use the dictionary method based on external semantic knowledge:1. The feature selection of text based on Tong Yi Ci Yu Lin:The text feature extraction method based on Tong Yi Ci Yu Lin is a kind of text processing method based on semantic analysis. Chinese corpus is small and messy, only the Tong Yi Ci Yu Lin can be used to study, and it doesn't have a systemic method for text categorization. To solve the above problem, this paper proposes an integrated text categorization method. This method utilizes the character of Tong Yi Ci Yu Lin, does the treatment about polysemy disambiguation, synonym replacement and combination of collocations step by step.2. An improved supervised method for text term weighting:The traditional unsupervosed method can't take full advantage of the characteristics of training data set for classification, which can't reflect the relationship of the term among the classes. This paper analyzes a new supervised text term weighting method, this algorithm overcomes the shortcomings of traditional algorithm, but it doesn't consider the relationship of term in the overall documents.To solve the above problem, this paper proposes an improved supervised algorithm for text term weighting, and considers the relationship of the term in the various categories and in the overall documents. The two methods of experimental results show that the method for feature selection based on Tong Yi Ci Yu Lin effectively reduces the dimension of text terms, and both methods are effective in improving the accuracy of the text classification.
Keywords/Search Tags:text classification, semantic analysis, feature extraction, supervised learning method, term weighting
PDF Full Text Request
Related items