Font Size: a A A

The Study Of Chinese Microblog Sentiment Orientation Basing On Combination Of Multi-method

Posted on:2017-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ZouFull Text:PDF
GTID:2308330482488692Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Currently, the sentiment analysis of Microblog is one hotspot of social network public opinion, which can bring about great social value and commercial value. The linguistic features of Microblog is its brief texts and various Internet new words. The insufficient identification ability for the new cyber words of Chinese word segmentation system lead that the accuracy of word segmentation result is not high enough, even have a negative effects of the next sentiment analysis. In addition, Microblog sentiment classification tend to ignore the topic relevance, because the extensive Microblog content result in the difficulty of topic clustering. Furthermore, the traditional dictionary is unable to judge those various and rapidly changing emotion characteristics. Presently, sentiment classification generally concentrate on coarse-grained without further subdivided, and in fact, ordinary machine learning method can not classify fine-grained sentiment high-efficiently.To solve the above problem, a creative blending strategy --two phases, four steps, several methods—is provided in paper. Firstly, the research includes two phases which are building and dealing sentiment characteristics respectively. During the building phase, the Chinese word segmentation is optimized and a Microblog sentiment dictionary is established. During the dealing phase, a topic clustering and fine-grained sentiment classification are designed. And to obtain a better results for the study of Microblog sentiment orientation, a variety of research methods are employed in each steps.Innovations and main strategies:1) For the Microblog linguistic features, a method which combine the statistics with regulation to discovery new words is proposed, and basing on this, the Chinese word segmentation system for Web is further optimized. 2) To establish the Microblog sentiment dictionary, the HowNet algorithm is employed to compute the semantic similarity basing on the sentiment seed words. Meantime, sentiment value is labeled for Internet new words basing on the Pointwise Mutual Information algorithm, and Microblog emoticon are recorded.3) For the hidden topic in the Microblog mass data, a topic-emotion model is proposed basing on LDA to solve the high-dimension problem caused by poor text, and meantime the local emotion degree of the topic is caught.4) In order to obtain a good analysis result of fine-grained sentiment orientation, decision tree and random forest model are used to train the sentiment characteristics. Through several experiments, the results confirm that the strategy provided in this paper is able to analyze efficiently and precisely for topic-related fine-grained sentiment orientation in the mass data of Chinese Microblog.
Keywords/Search Tags:Microblog data mining, new words discovery, sentiment dictionary, topic cluster, fine-grained sentiment
PDF Full Text Request
Related items