Font Size: a A A

Research On Event-oriented Sentiment Orientation Analysis Of Micro-blog

Posted on:2015-12-27Degree:MasterType:Thesis
Country:ChinaCandidate:H H TangFull Text:PDF
GTID:2308330482979163Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The rise of Web2.0 has brought people into a new social network era. Micro-blog, as a typical representative of the social network, has features of convenient publishing, rapid propagation, and powerful interaction. It has become a significant platform for netizen to achieve current events, share information, and express emotion. A large scale of texts about events is produced on micro-blog every day. Most of the texts are descriptions or comments on current hot events and usually contain netizen’s subjective emotions, which can reflect the public opinion on current hot issues. So, quickly and efficiently tracking micro-blog events and properly analyzing the sentiment orientation of the associated comments can assist relevant departments to quickly understand the public opinion on the Internet and promptly. guide the trend. It has great significance to maintain the harmony and stability of the society. This thesis makes deep research on the technologies of event-oriented sentiment analysis of micro-blog, including micro-blog events tracking, semantic orientation identification of micro-blog words, and sentiment orientation analysis of micro-blog. The main contributions are listed as follows:(1) Traditional methods for micro-blog events tracking usually calculate the formal similarity between features and ignore the semantic information, which result in unsatisfactory results in micro-blog events tracking. To solve this problem, a method based on Wikipedia knowledge for micro-blog event tracking is proposed, considering Wikipedia with high accuracy, wide coverage, significant semantic structure, etc. Firstly, the strong entities and alias-entities of Wikipedia entries are defined respectively, and Wikipedia knowledge is represented as five tuple forms. Then, a map from word space to Wikipedia entry space is constructed to indicate the initial event vector and subsequent micro-blog vector. Finally, the semantic similarities between micro-blog vectors and the initial event vector are calculated, and the event tracking is completed according to whether the similarity is bigger than the pre-set threshold. Experiments are conducted on ten hot micro-blog events selected in 2012 and the comprehensive F1-Measure reaches 89.90%, which shows that the Wikipedia knowledge-based micro-blog event tracking method can make full use of semantic information of the Wikipedia knowledge and can effectively improve the tracking performance compared with traditional methods.(2) Traditional algorithms based on corpus are sensitive to the seed words, and can not effectively identify the semantic orientation of low-frequency words. On account of this, this thesis combines the property of rich emoticons in micro-blog text with the advantage of Word Affinity Measure making full use of the context information, and puts forward an algorithm based on Word Affinity Measure to identify the semantic orientation of words from Chinese micro-blogs. Firstly, candidate words are extracted by the part of speech combination pattern. Secondly, micro-blog emoticons are selected as seed words, and word affinity networks are built. Then, low-frequency words are expanded by a synonyms dictionary, during calculating the semantic orientation similarity between candidate words and seed words. Finally, the semantic orientation is determined according to a threshold. Experiments are conducted on a corpus with 2 million micro-blogs with the result that the average identification precision rate arrives at 84.58%, which shows that the proposed algorithm is relatively less sensitive to the seed words and can effectively identify the semantic orientation of low-frequency words which leads to a superior performance compared with other typical algorithms. Furthermore, the proposed algorithm can also capture the semantic orientation strength of micro-blog words while effectively identifying their semantic orientation, which lays the foundation for the research of fine grained sentiment analysis.(3) Among the methods of sentiment orientation analysis of micro-blog, supervised methods need a high-quality label corpus, which is difficult to obtain, and the classifier trained in one domain is often unable to be applied to other domains. Therefore, an unsupervised method based on hierarchical Dirichlet processes for sentiment orientation analysis of micro-blog is presented, considering which can automatically identify the hidden topics as well as its number. Firstly, the method uses the HDP model to mine the hidden topics in the documents. Secondly, a sentiment dictionary is used to calculate the sentiment distributions of the topics. Finally, the sentiment orientation of a micro-blog is obtained on the basis of the sentiment distributions of the topics. Experiments are conducted on the NLP&CC2012 corpus with the average F1-Measure running up to 73.61%, which shows that this method can accurately identify the sentiment orientation of micro-blog in unsupervised conditions and can also avoid the difficulty of getting a high-quality label corpus for supervised methods.
Keywords/Search Tags:Micro-blog, Wikipedia Knowledge, Event Tracking, Word Affinity Measure, Sentiment Lexicon, Hierarchical Dirichlet Processes, Sentiment Orientation Analysis
PDF Full Text Request
Related items