Font Size: a A A

Sentiment Analysis Of Chinese Short Text Based On Statistical Methods

Posted on:2017-02-17Degree:MasterType:Thesis
Country:ChinaCandidate:X LvFull Text:PDF
GTID:2308330503458879Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
In recent years, as an important research topic in the natural language processing field, the sentiment analysis of text has attracted more and more attention. With the development of the Internet, the sentiment analysis technology is playing an important role in public opinion monitoring and events predicting. The thesis designs sentient analysis methods based on statistical methods for Chinese short text adopting the reviews of products as the data set. The thesis uses the support vector machine and the convolutional neural network as classifiers. Then the thesis tries to improve the text modeling according to the extended sentiment dictionary. The result of the experiment shows that the precision is improved. At last, the thesis compares the SVM with the CNN to explain the advantages of each method.The main research contents in the thesis are listed as follows:Firstly, a method of extending the sentiment dictionary which is the basis of the sentiment analysis is designed. The method includes the following steps: gathering text data from the Internet, discovering new words, words segment, training word2 vec word embedding model, generating similar words of the existing sentiment words by the word embedding model, filtering similar words semantically. The vocabulary of the sentiment dictionary is increased.Secondly, the thesis applies the SVM to the sentiment analysis. Text is converted into vectors according to the vector space model. The extended sentiment dictionary is used to change weights of some features. Then vectors of text are classified by the SVM and the result shows that the improvement is effective.Thirdly, the thesis applies the CNN to the sentiment analysis. Text is converted into matrices according to the word2 vec word embedding model. Then a new dimension is added to matrices of text in order to express emotion information. The CNN is used to estimate the sematic orientation of matrices of text. The result shows that the improvement is effective to some degrees. At last, the thesis compares the SVM with the CNN and explains the advantages of two methods.
Keywords/Search Tags:natural language processing, sentiment analysis, sentiment dictionary, SVM, CNN
PDF Full Text Request
Related items