Font Size: a A A

Research On Sentiment Classification For Chinese Online Comment Texts Based On Word2vec And SVMperf

Posted on:2016-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z C SuFull Text:PDF
GTID:2308330461457422Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, with the emergence and rapid development of web2.0 technology, especially the emergence of many new Internet platforms such as e-commerce, blog, forum and weibo, more and more users are accustomed to present their views and feelings on these network platforms. And with the increasing of network users, the number of user comment is exploding. Analyzing these comments in order to get valuable information only by users is impractical. As a result, a new research field which can help users to analyze and select huge comments so as to extract valuable information emerges at the right moment, i.e., "sentiment classification".Within the sentiment classification, the most commonly used and effective way is the method based on machine learning. In the study of sentiment classification based on machine learning method, the most important work is the extraction of effective features. Most of the existing researches are focused on the extraction of lexical features and syntactic features, while the semantic relationships among words are ignored. In view of this problem, the following three problems are mainly studied in this paper:1) An approach to cluster the similar features on corpus is proposed. The words describe same feature of products, which are domain synonyms need to be clustered together in order to produce a useful comment summary. The experimental results show that the similar features can be extracted from the corpus well and grouped under the same product feature group.2) A method for sentiment classification based on word2 vec and SVMperf is proposed in this paper. According to the characteristics of word2 vec, the words in the text are represented as the high dimensional vectors in the vector space. Based on cosine similarity, the semantic similarity between words can be obtained and these vectors are treated as semantic features. Then the SVMperf classification model is adopted to train the semantic features and get the final classification results. The experimental results show that this method can obtain satisfactory classification performance.3) In order to further enhance the accuracy of sentiment classification, context structure features, such as the negative words, the degree words and the transition words are taken into account when extracting features. Experimental results show that sentiment classification method combined the context structure features can obtain better classification results.Finally, a system of stock analysis that combines sentiment classification algorithm with practical application has been developed in this paper. The system analyzes and compares the sentiment index of investors comments with the current price of stock to estimate whether both of them have a correlation or not.
Keywords/Search Tags:Similar features clustering, Sentiment classification, Word2vec, SVMperf, Semantic features, Context structure features
PDF Full Text Request
Related items