Font Size: a A A

The Research Of Text Sentiment Analysis Based On The Fusion Of Lexicon And Doc2vec

Posted on:2019-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y M HuFull Text:PDF
GTID:2348330542955573Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The rapid development of the Internet and the rapid rise of social networking have enabled a large number of Internet users to share their emotions and ideas in the Internet.Studying the emotional tendency of the Internet is of great significance for grasping the status quo of the society and the dynamics of the events.This is a great help to government,business and other actors in their decisions.At present,there are many problems existing in traditional methods of emotion analysis,such as sentiment dictionaries,word bags,etc.The sentiment dictionaries have a low degree of completeness and great differences in sentiment lexicons in different fields,making it difficult to obtain the desired effect.The traditional feature selection The method has a large dimension and does not consider the order of words and other issues,the use of results is not very satisfactory.In this thesis,in view of the above problems,learn the advantages of the traditional methods,combined with Doc2 vec,analyze the text emotions to solve the above existing problems.The main contents are as follows:(1)Exploring the expansion of the sentiment dictionary,it is necessary to dig new sentiment words to solve the problem of low completeness of the sentiment dictionary,including two measures,one is based on the rule template and the other is based on the English sentiment dictionary.The former contains three stages of manual collection rules,obtaining candidate emotion words and determining emotional polarity.The former mainly uses rules and point mutual information to expand the sentiment dictionaries,which has the characteristics of convenience and quickness.The latter,based on the Engli sh sentiment dictionaries and Chinese-English parallel corpus English word alignment information to expand the Chinese emotional dictionary,get as many new words of emotion.(2)The feature selection and representation of Doc2 vec based machine learning me thod is studied in order to solve the problem of high feature dimension and semantic and word order information ignored in traditional methods.It mainly includes two aspects.One is the combination of the word vector after the word segmentation and the emotion word in the emotion dictionary together as a feature in the method of word vector and emotion information.Secondly,the feature selection and representation method of sentence vector The text is trained as a whole,vectors are used to train the classifier to reduce overall dimensions,combine emotional information,and also consider word order factors.(3)In order to verify the validity of this method,this thesis compares the two experiments with the critique data.The control group is a traditiona l sentiment dictionary method,and the experimental group is an experiment of sentiment analysis using the sentiment dictionary construction and extension method proposed in this thesis.The experimental comparison between the traditional machine learning method and the Doc2vec-based method is used to verify the validity of the method based on the Doc2 vec feature selection and representation.From the experimental results,the author concludes that the method of constructing and expanding the emotion dictio nary and the Doc2vec-based The effectiveness of feature selection and representation methods.
Keywords/Search Tags:Sentiment analysis, Lexicon, Doc2vec, Natural Language Processing
PDF Full Text Request
Related items