Font Size: a A A

Research On The Construction Method Of Domain Sentiment Lexicon In The Field Of Chinese Social Media Comments ——Based On Conditional Random Fields And Ensemble Learning Rules

Posted on:2022-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:2518306746462784Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,the online retail industry has developed vigorously,especially during the COVID-19 pandemic.The sales of some Internet retail applications and the number of users have increased rapidly,which has promoted the sustained and steady development of the consumer market.A large number of consumers commented on the products they purchased on Chinese social media platforms,resulting in an exponential increase in the amount of product review text information.These product reviews contain consumer opinions and emotional information on all aspects of the product.How to dig out the really useful information in Chinese social media reviews and make sentiment analysis on product reviews has attracted the attention of more and more experts and scholars.The accuracy of constructing sentiment lexicon is the key to comment mining technology,and it is also an important basis for comment sentiment analysis.In the environment of Chinese social media comments,users freely and freely express their opinions according to their own habits.Among them,a large number of new online words and domain-specific nouns have emerged.The existing general sentiment dictionaries have narrow coverage and low domain coverage,etc.The characteristics can no longer cover all the emotional words that the product belongs to,which puts forward higher requirements for the domain of the Sentiment Lexicon.The text data of product reviews has the characteristics of a large number and a wide range of fields.It is difficult to build an Sentiment Lexicon only relying on manual creation by professionals in the field.Therefore,the core work of the current research of experts and scholars is how to automatically or semi-automatically build a domain sentiment lexicon that covers the sentiment words in the domain on the basis of manually marking part of the sentiment information.In response to the above problems,the research purpose of this article is to base on the user-generated comment text content of a certain product field in the environment of Chinese social media,and accurately discover the content of the new Internet words with emotional information and domain-specific terms in other fields.The unknown sentiment words depend on to construct an sentiment lexicon in a certain product field,and then apply it to sentiment analysis in this product field.The research method of this paper is to preprocess the actual product review text data,use the CRF(Conditional random fields)model to extract the emotional words of the product review as the input corpus,select How Net Chinese Sentiment lexicon as the seed lexicon,and use PMI technology and The Word2 Vec technology judges the emotional tendency of candidate words,According to the ensemble rules,the models built by the two algorithms are used as the base classifiers to comprehensively determine the recognition results to construct the domain sentiment lexicon.In order to evaluate the effect of the constructed domain sentiment lexicon,it is applied to the sentiment binary classification experiment of the text in this domain,and the accuracy of the experimental results is compared to verify that the method of constructing the domain sentiment lexicon in this paper is effective.It mainly includes the following two aspects of work:1.Investigating the method of finding unknown emotional words,and proposing a method for extracting emotional words in Chinese social media comment texts based on the CRF model.2.Research the method of establishing the domain sentiment lexicon,and propose a construction method of the sentiment lexicon in the domain of Chinese social media comments based on ensemble learning rules.The sentiment lexicon constructed according to the above method is more suitable for sentiment analysis in the field of experimental products.The scale of positive and negative sentiment words of the domain sentiment lexicon constructed in this way has increased compared with How Net Chinese sentiment lexicon,both of which are in this field.The effective supplement of internal neologisms and proper nouns.At the same time,the accuracy of the sentiment lexicon constructed by the ensemble rules proposed in this paper is as high as 86.1% in the text sentiment classification experiment,which is 3.6% and 2.0% higher than the accuracy of the sentiment lexicon constructed using the PMI algorithm and the Word2 Vec algorithm alone,can be effectively applied to the sentiment analysis of the product field in the Chinese social media environment.The limitation of this article is that the data set in this article has a single source,and the effect of applying it in other fields is unknown.It can be applied to more fields in the future and use a wider data set for research.In terms of construction methods,in the future,when using CRF to extract candidate emotional words,a method based on neural network training can be tried to train and learn a large number of data sets to extract more unknown emotional words.When constructing the domain sentiment lexicon based on the ensemble learning method,we can consider introducing algorithms such as adjacency entropy to construct more base classifiers for comprehensive judgment to improve the accuracy.
Keywords/Search Tags:Domain Sentiment Lexicon, CRF, PMI, Word2Vec, Ensemble learning
PDF Full Text Request
Related items