Font Size: a A A

The Research And Implementation Of Multi-Level Sentiment Analysis System On Chinese Comments

Posted on:2016-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:X R QuanFull Text:PDF
GTID:2298330467992527Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In our modern life, Internet has connected with our daily activities tightly. Thanks to the evolution of mobile communication net and the popularization of Wi-Fi technology, we can easily stay online all day long. On the one hand, people use scattered time to browse news, contact with friends, and share happiness with them, which can increase our life’s efficiency. On the other hand, we spend more time than ever online and create massive data every day, such as billions of micro-blogs round the globe and numerous product comments on e-business website. People upload videos and pictures, too. If we can analyze the information quickly and reliably, government and social media can learn how the public attitude is, corporation can inquire the market and consumers can base their decisions on former buyers’ experience.This paper focus on analyzing short text Chinese comments by partitioning them into several levels. Usually, people’s comments will not be long. For example, micro-blog in China can only include140characters, and most of comments on products and service are no longer than100characters according to our study. This makes those comments have comparatively clear targets and highlighted emotional tendency. In natural language process domain, people prefer to do research by three levels, such as word, sentence and passage. In our paper, we follow this routine as well. Our main tasks are as follows:Firstly, this paper started research by analyzing syntactic structure in both Chinese and English, referred to domestic as well as foreign latest achievements, and finally constructed vocabularies of noun, adjective, adverb of degree and negative adverb to calculate sentence’s polarity and intensity.Secondly, the vast majority of comment targets are noun. In our paper, we believe that noun is the most valuable and mutable among the four types of words. It is quite necessary to put up with new algorithm to weigh and rank those nouns. In consequence, this paper proposes a model named LPCE (LDA+PageRank+Condition Entropy), which considers word’s frequency, and influence of its co-occurrence with other noun or adjective. Experimental results show that not only ranking precision is improved obviously, but also its score becomes more reasonable.Thirdly, because of the need to calculate the quaternary group, this paper comes up with a new algorithm to segment sentences named SOW (Sequence Of Words) segmentation algorithm. It can quickly segment sentences and extract quaternary groups from them, which uses the relationship among words and decision method such as regular expression. Compared with traditional time window method, SOW segmentation algorithm is more flexible. Compared with syntactic analysis method, this algorithm shows great improvement on running speed.Fourthly, to protect quaternary group calculator from failure in condition of incomplete sentences, this paper adds naive Bayes classifier to the multi-level sentiment analysis system, which can help judge sentence’s polarity. At the same time, in our BQMSAS (Bayes-Quaternary Multi-level Sentiment Analysis System) intensity is divided into five levels at both positive and negative directions, in sum this paper ranks intensity by eleven levels including neutral level.According to these research results, this paper accomplished related requirement analysis and system design, finally encoded all algorithms and interfaces.
Keywords/Search Tags:Comment target, LDA, Conditional entropy, Multi-level sentiment analysis, NLP
PDF Full Text Request
Related items