Font Size: a A A

Analysis Of Sentiment Tendency Based On Sentiment Dictionary And Semantic Orientation Pointwise Mutual Information Algorithm

Posted on:2021-04-03Degree:MasterType:Thesis
Country:ChinaCandidate:C F YuanFull Text:PDF
GTID:2428330647461950Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the widespread use of the Internet and the exponential growth of network users,the Internet platform has a large amount of corpus data that can be used as text sentiment analysis.If these corpus data can be fully tapped and utilized,related industries will get huge benefits.How to effectively acquire these corpora and accurately analyze their emotional tendencies,and then conduct research in related fields based on these corpus data,so as to provide data support and precise assistance for the development of related fields,is an important research point in the natural language processing discipline.Sentiment analysis is defined as analyzing the subjective or objective emotional expressions made by a person on a specific event or derivative of an event which is usually displayed in text,video,voice,and expression.Sentiment analysis technology is widely used in recommendation system,social public opinion analysis,film and television evaluation,etc.Sentiment analysis includes emotional information extraction and emotional tendency analysis.In this paper,the management of sentiment dictionaries,the extraction of new sentiment words and the classification of sentiment tendencies are studied in depth,and the research on text sentiment tendencies based on sentiment dictionaries and SO-PMI algorithm is proposed.The main research contents of this article includes:(1)A pattern matching optimization algorithm based on the principle of finite state machine is proposed.The effective management of sentiment dictionary can save memory space and improve the running speed of the system.Based on the finite state machine pattern matching algorithm,this paper optimizes the storage structure and pattern matching process of nodes,and improves the running time of pattern search.By setting different variables,compared with multi-pattern matching algorithms such as AC?BM algorithm,AC double array algorithm and AC?BMH,experiment result verify that the optimized algorithm can effectively manage emotion words,and also improve the efficiency of pattern matching.(2)Propose an analysis method for extracting candidate emotion words based on syntactic dependence and N-Gram algorithm.Based on the syntactic dependency relationship,this paper uses the dependency rule of the word and N-Gram as features to extract candidate sentiment word units.Then calculate the PMI values of the candidate sentiment words and the reference sentiment dictionary to determine the polarity of thecandidate sentiment words.Finally,the newly-extracted known sentiment polarity words are included in the total sentiment lexicon to increase the coverage of sentiment dictionaries.According to the evaluation index results,the average accuracy rate of obtaining candidate emotion words from the features extracted by the dependency syntax is83%.(3)Propose the research of emotion tendency classification based on pattern matching and sentiment dictionary,supplemented by sentiment dictionary and PMI algorithm.First,according to the existing sentiment dictionary library and based on the pattern matching algorithm to extract the known sentiment words,and then determine the known sentiment part of speech based on the sentiment word polarity in the sentiment dictionary;Second,for the corpus that is not included in the sentiment dictionary Words,the paper based on syntactic dependencies and the N-Gram algorithm as a feature extraction method to extract candidate emotion words for continuous acquisition of new emotion words;Finally,combining the two classification methods to obtain the sentiment tendency of the entire sentence.Experiment result verify that the accuracy rate of sentiment tendency of the method in this paper is 84.37%,which is 7.23% higher than the method based on traditional sentiment dictionary and 6.05% higher than the method based on SVM classification.
Keywords/Search Tags:Sentiment Dictionary, Pattern Matching, Dependency Dyntax, N-Gram Feature, Point Mutual Information, Extraction of New Emotional Words
PDF Full Text Request
Related items