Font Size: a A A

The Research On Chinese Microblog Sentiment Analysis Based On Rules And Machine Learning Methods

Posted on:2016-09-10Degree:MasterType:Thesis
Country:ChinaCandidate:L ShenFull Text:PDF
GTID:2308330461991824Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the period of rapid development of the Internet,the rapid development of social networking and mobile terminal technology, rises a number of social media platforms and social networking sites, microblog is also one of the most popular kind of social networking platforms.As a major form of social media websites, microblog is short, lean, fast, more and more popular.People tend to get news, commentary, entertainment,online learning and other information from the microblog. Unconsciously, microblog_ network exerts increasingly important influence to disseminate Internet public opinion.Since users are keen to to express their views, attitudes, emotions and others from the microblog, mining these microblog emotions has great significance on monitoring public opinion for the government, the business market analysis, user reviews analysis, decision-making, and so on.Traditional microblog emotional sentiment analysis that just judges the microblog emotional tendency from short text analysis,and the final analysis of the microblog is only a commendatory emotional tendency, neutral or derogatory.Due to the special nature of microblog short text language,it is different from ordinary texts.Often there will be very brief, colloquial, not standardized grammar, the network of new words, such phenomena occur frequently.Therefore,for these phenomena in microblog language,more fine-grained study is particularly important.This article will be divided microblog emotion into seven categories such as angry, disgust, fear, happiness, love, sadness and surprise,using semantic dictionary and rule weighted method and supervised machine learning method to classify on a selection of fine-grained microblog text sentiment.Emotional analysis of microblog text includes topics related sentiment analysis and sentiment analysis irrelevant.The main contents of this thesis are as follows:(1)Microblog text language characteristic phenomena are studied,from these phenomena we can analyze various microblog text features which is usefull for classification,and describes the current research basis and existing state of affairs of sentiment analysis research,and domestic and international evaluation conference on sentiment analysis and sentiment analysis applications related research.(2) We design and build a text-Chinese microblog emotion dictionary,emotional dictionary categories include seven categories such as anger, disgust, fear, happiness, love, sadness and surprise.In addition,we also build a new network word dictionary, degree adverbs dictionaries, conjunctions dictionaries and the expansion of emoticons.Our emotional dictionary is based on the Dalian University emotional vocabulary ontology,and we have expanded it.Then we do the experimental comparison based on our emotional dictionaries and Dalian University emotional vocabulary ontology.Experimental results show that the dictionary which we build get better results.(3)Based on builded dictionaries,we use text-weighted algorithm based on dictionaries and rules to compute microblog emotional weighted value.As a rule-based dictionary approach, later compare with machine learning based approach.(4) In this thesis, sentiment analysis method based on class sequence rule multi-feature fusion method,using two methods to get two microblog tags based dictionaries and traditional SVM.The microblog text represent by sequence form with conjunctions.After mining the sequence rules,we can use these rules as feature and then the begin classifier training.Finally we use the extracted emotional word feature, punctuation feature, sentence structure and class sequential rules feature for training.After adjusting the parameters we get better classification classifier.(5)For topics related microblog data,we also take account into the impact of the extraction of topics characteristics.Experimental results show that the probability distribution obtained by topic model as the feature can also enhance the final classification results.(6)In this thesis, our data set is a microblog data provided COAE meeting.Training data contains 4000 microblog and 14,526 sentences and test set contains 5000 microblogs and 16,785 sentences.Each type of microblog emotions have been marked and all of these texts are from Sina Weibo. Finally, we finish the experimental comparison of various methods.
Keywords/Search Tags:Chinese microblog, emotion dictionary, sentiment analysis, multi-feature fusion, class sequential rule, Topic Model
PDF Full Text Request
Related items