| The technology of computer and network is growing rapidly in recent years.Network platforms have become the carriers for people to acquire,publish,share and disseminate information.This information has important social and commercial value in different fields such as governments and electronic commerce.This paper takes the corpus information published by people in Sina Weibo platform as the research resources and focuses on the two methods based on sentiment dictionary and machine learning.The main research contents are as follows:1.In the view of the fact that the existing sentiment dictionary from Sina Weibo field due to the low coverage rate of new internet slang,this paper collects some existing basic sentiment dictionaries,network sentiment dictionaries and emoticon libraries.After removing the repeated words,the basic comprehensive sentiment dictionary is constructed.Aiming at the problem that the co-occurrence window size and corpus size adversely affect the algorithm effect in SO-PMI algorithm,this paper proposes to use the distance mutual information and Good-Turing smoothing method to optimize the SO-PMI algorithm and make use of the improved SO-PMI algorithm to extend the sentiment dictionary based on the Weibo domain.Comparing with the comprehensive basic sentiment dictionary from experiments,the sentiment dictionary based from the traditional SO-PMI algorithm,the sentiment dictionary extended by the Laplace smoothing SO-PMI algorithm,the Chinese Weibo comprehensive emotion dictionary constructed in this paper has better effects on sentiment analysis than the other three sentiment dictionaries.2.The research analyzes the commonly used feature selection algorithms and focuses on the information gain algorithm.The Concentration Degree and Distribution Degree are proposed to improve the distinguishing effect of feature items on categories,which aim at the problem that the traditional information gain algorithm not considering the distribution of feature items,within nor between classes,and the problem of unbalanced positive correlation feature to negative correlation feature ratio.The chi-square statistic method is used to find the maximum value of the two to apply the calculation of Concentration Degree and Distribution Degree to the entire corpus and introduce a scaling factor to reduce the adverse effects caused by the negative correlation feature.The method increases the proportion of positive correlation feature items.The experimental information is compared with the traditional information gain algorithm based on the traditional information gain algorithm and the improved information gain algorithm.The improved information gain algorithm is better than the traditional information gain algorithm in the Weibo sentiment analysis.3.This paper combines the integrated Chinese Weibo comprehensive sentiment dictionary with the improved information gain algorithm to optimize the process of feature selection.The Chinese Weibo comprehensive sentiment dictionary constructed in this paper combines the advantages of both,and the dimensionality reduction effect of the featured items is obviously better than any of the two,respectively.Figure[16]table[40]reference[53]. |