With the rapid development of electronic commerce technology on the Internet,there has been generated mass subjective texts which express commemters’ emotion. How to deal with those texts effectively becomes more and more important, in the context, sentiment analysis technology has emerged. Sentiment classification as an important subtask of sentiment analysis attracts a lot of researchers’ attention. But existing studies on sentiment classification mainly focus on the language of English, relatively little research on the Chinese. In addition, in recent years, with the international trade and economic globalization, there is a lot of information linguistic diversity of the phenomenon in the Internet. This enables multinational companies, governments, and even individuals have had to face the document in different languages, in this case, multilingual sentiment classification has become an important research topic. Currently, however, whether it is a single language, or languages, document-level sentiment classification methods, failed to adequately consider the unique characteristics of emotional knowledge, resulting in sentiment classification accuracy rate is lower than the normal text classification. To address this issue, this paper have had a in-depth research on Chinese and cross-language sentiment classification, digging out the emotional semantic features, and then combining them with machine learning methods, the main work includes:(1) To solve the drawback that the precision of the document-level sentiment classification is lower than that of the normal text classification, this paper proposes a semantic weight-based Native Bayesian algorithm for text sentiment classification. Firstly, we score and weight the words in an emotion dictionary by means of a feature selection method. Then, based on the correlation between the distribution of dictionary polar and the document-level sentiment classification, we merge the semantic weight feature into Naive Bayesian classification and achieve a new algorithm. Finally, we perform lots of experiments on some standard Chinese data sets. The results show that our algorithm is better than some existing algorithms on precision, recall and F-measure.(2) To solve the problems that translating a language to the other one often brings too many errors for later applications, and in the document-level sentiment classification methods, the algorithm often consider only the distribution information of emotion, while ignoring the semantic emotion knowledge. This paper proposed a cross-language sentiment classification algorithm based on the dependency analysis and property probability weights. Firstly, we got dependency relations by dependency relation parsing before translating. Then, based on the correlation between the distribution of dictionary polar and the document-level sentiment classification, the weight feature of property probability was merged into Naive Bayesian classification to improve the classification effect. Finally, we used English data sets for training and standard Chinese data sets for testing to perform extensive experiments. The results showed that the proposed algorithm has more superior performance compared to other existing algorithms. |