Font Size: a A A

Research On Vector Representation Of Text Sentiment Analysis Based On Word Vectors

Posted on:2020-11-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:W T ZhouFull Text:PDF
GTID:1368330620452320Subject:Intelligent Environment Analysis and Planning
Abstract/Summary:PDF Full Text Request
With the advancement of industry and information technology,the impact of humans on the natural environment has gradually increased,and environmental changes have also affected the activities of human society.Understanding people's thoughts on environmental change can more accurately and quickly identify problems that need to be solved in related work.Moreover,with the development of self-media,people can express their opinions on various platforms,and these opinions are valuable for others.Since many opinions are published in the form of texts and the fields are different,automatic processing of these comments by computers has become a hot research spot.This type of task is called sentiment analysis.It can be divided into the document-level sentiment analysis that analyzes the entire documents,and a fine-grained sentiment analysis that analyzes sentences and words.Text representation has always been a critical part of the task of dealing with human natural language.Classic feature extraction and feature weighting methods often use the Bag-of-Words model,which may bring about a lack of semantic information as well as the problems of high dimensionality and high sparsity.In this paper,the feature weighting and word embedding method Word2 Vec is combined with a topic model LDA(Latent Dirichlet Allocation),and the new text representation methods are proposed in both the document-level and the fine-grained sentiment analysis.The new methods have low model dimensions and cover more semantic information.For document-level representation,an unsupervised text representation method based on feature probability embedding vector is proposed.It consists of three models,FTW,FTC and FT2.They are mainly for short texts.Our method increases the semantic information from the perspective of words,increases the expressive ability of the vector space model from the perspective of space,and reduces the dimension of the document vectors,which can solve the lack of semantic information of the Bag-of-Words model well.In order to verify the validity of the method,the proposed method is tested in two Chinese and English data sets.For fine-gained sentiment analysis,this thesis proposed a word vector representation method based on Frequently Co-occurring Entropy(FCE)and Fuzzy Bag-of-Words Model(FBoW),named Frequently Co-occurring Entropy and Fuzzy Bag-of-Words Model(FCE-FBW),then clustered the words in the corpus of different fields,and integrated the similar words together,the resulting clusters can be useful for tasks such as building domain knowledge bases.The FCE approach is used to find common words.The FBoW model supports the description of the same word from multiple dimensions,based on word embedding techniques.In this paper,the two algorithms are combined to obtain a new word vector representation method,which is suitable for word clustering tasks.Experiments can prove that the method is effective.Another important requirement in the fine-gained sentiment analysis is to extract key information from the texts.Traditional feature extraction methods and various topic models can be used for such tasks to find key information.In this thesis,the feature weight is based on the word vector,and then the feature weighted result is combined with the LDA topic model to propose a similar feature frequency based text representation method,named Similarity Term Frequency and Latent Dirichlet Allocation(STF-LDA).Experiments show that this method can put the words which are semantically more relevant under the same topic.
Keywords/Search Tags:Text Representation, Sentiment Analysis, Feature Probability Embedding, Frequently Co-occurring Entropy, Feature Similarity
PDF Full Text Request
Related items