Font Size: a A A

Research On Chinese Micro-blog Sentiment Classification Based On Machine Learning Technology

Posted on:2020-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:C Y ChangFull Text:PDF
GTID:2428330590951076Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the 21 st century,the vigorous and rapid development of Internet science and technology has opened a new chapter of reforming for social progress.A large number of social software,including Blog,Internet forum,Micro-blog and other social media,known as applications with the characteristics of knowledge interaction,have come into public view.Along with the appearance of these media platforms,a huge amount of textual data has been generated as well.It has become a heavy task to research and a long way to go in order to make full use of such a large data set and dig out its internal value.As a new type of social network platform,micro-blog is favored by more and more scholars who are dedicated to the study of text sentiment classification due to its simple and easy-to-operate trait.Through the way of the computer terminal or mobile terminal,Users can log in and publish subjective sentiment texts of events,characters or objects.These subjective texts can be identified by the computer and classified as positive or negative attitudes accurately.Therefore,there is a broad research prospect in many practical social application domains.Sina micro-blog is taken as the research object in this paper and the data set is collected by the web crawler.In the paper,firstly we discuss Chinese text classification through the way of the emotional dictionary.Then on this basis,we conduct some experiments based on another way,chinese text is classified by machine learning.And at last,according to the result,the feasibility of the improved algorithm proposed in the experiment is verified.The research contents mainly in this paper include these following three aspects:(1)Extension and self-construction of sentimental dictionary.In the process of constructing a sentimental dictionary,three open source emotion dictionaries are relabeled and fused into a basic affective lexicon ontology.In addition,a dictionary expansion algorithm based on corpus and SO-PMI is proposed to solve the problem of the unregistered words and new words in the actual micro-blog.The experimental result shows that the average index F of the extended affective lexicon increased by 1.11%.(2)A new weighting algorithm with respect to affective polarity unit.Considering the influencing factors of micro-blog emotion,and combining semantic rules and sentence structure,this paper adds the important element of expression features to deduce and optimize the formula of emotion calculation.It builds a model for the classification of three-category emotion to improve the accuracy of positive,negative and neutral polarity classification to some extent.(3)A machine learning approach incorporating semantic rules.This paper introduces a toolkit called libsvm to build the model of emotion classifiers by applying support vector machine.Compared with Naive Bayes,it verifies the advantage of support vector machine(SVM).When finishing the operation of separating Chinese word and extracting characteristic term,we make some improvement based on TF-IDF algorithm.According to the basic theory of sentiment dictionary and semantic rule and considering these factors of the emotional words and non-emotional words which can represent different ability to affect emotional degree,in addition,adverb of degree and specific symbol is listed as well.These points will contribute to the weighted processing on the basis of the actual frequency of emotional word's occurrence.The results showed that the experimental accuracy and recall of STF-IDF,whose algorithm is based on TF-IDF,has improved by 5.97%.
Keywords/Search Tags:Emotional classification, Sentimental dictionary, Machine Learning, Semantic rule, Emotion weighting algorithm, Feature weighting
PDF Full Text Request
Related items