Font Size: a A A

Classification Of Chinese Text Subject Classification And Emotion Based On Machine Learning

Posted on:2015-01-23Degree:MasterType:Thesis
Country:ChinaCandidate:X C FanFull Text:PDF
GTID:2268330425988345Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development and widespread application of computer, network, and database technology, network information grows explosively, and most of network information is in the form of text. How to get useful data fast and efficiently from massive data becomes an urgent problem in the field of information processing. Thus Automatic text classification technology, which acts as the key technology of a large number of text data processing and organizing, comes into being and has achieved rapid development ever since.Topic-based text classification divides the text into predefined classes based on the content of the text. Because of machine learning’s flexibility and the better classification effect it can obtain, it has been widely used in text categorization. The process of machine learning contains several steps, including text preprocessing, feature selection, feature weighting, classifier training and classifying process. Feature weighting is an important part of the procedure of text categorization, and it can directly affects the performance of text classification. By examining the traditional feature selection function, we find that the method of mutual information in feature weighting process particularly prominent. In order to improve the performance of the method of mutual information in feature weighting, and add the term frequency information, document frequency information and categories correlation factor, we propose a feature weighted method based on mutual information, experiments show that this method has better classification performance than the traditional feature weighting method.Text sentiment classification is an important branch of text classification, and it has gradually become the hotspot in the field of information retrieval and natural language processing. The method of Machine learning is also applied to text sentiment classification, but the effect is different from the topic-based text classification. We choose Chinese sentiment classification data sets which is widely used in the network, and used machine learning method to carry on the experiment, comparing and analyzing the effect of commonly machine learning method which used in the process of text sentiment classification.Because the corpus of sentiment text are more complex and variable, traditional machine learning method is very difficult to achieve high performance. Through analyzing the text of comments, and combined with the method based on dictionary and rules, we divide the whole text into sentiment sentence set and detail sentence set, and from which we further extract the key sentence set. We then train the whole text, sentiment sentence set and key sentence set as the text training sets, and generate three classifier. We combine them using the vote strategy to obtain the final result. The experiments show that this method can effectively improve the performance of text sentiment classification.
Keywords/Search Tags:text classification, mutual information, sentiment sentence, key sentence, classifier combination
PDF Full Text Request
Related items