Font Size: a A A

Research On Feature Selection And Weight In Emotional Analysis Of Chinese Text

Posted on:2018-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y SuFull Text:PDF
GTID:2348330536979680Subject:Detection Technology and Automation
Abstract/Summary:PDF Full Text Request
With the rapid development of China's film industry and the continuous improvement of people's living standards,watching movies has become one of the main activities in people's leisure time.Before enjoying the film,people are often caught in a dilemma of choosing which and where to watch it.Some of the country's well-known film community website or APP contains a wealth of film review information for the majority of users to make an important basis for decision-making.The sentiment analysis of users' reviews including their response to films plays a significant role in theoretical analysis,design and application of.At present,few researches are done on sentiment analysis in the field of film in china,and the film reviews are not made full of.Taking Douban as corpus source,this paper conduct sentiment analysis of Chinese movie reviews.The paper also improves the efficiency of sentiment analysis by the perfection of feature selection and algorithm of term weighting.And it constructs the integrated film-scoring model based on sentiment analysis technology.The main results are as follows:(1)Summarizes the research situation of the field of sentiment analysis,introduces the main steps of the emotional analysis of text preprocessing,text representation,feature selection,feature weighting algorithm,text classification algorithm,the general evaluation index of sentiment analysis.In this paper,the SVM algorithm,naive Bayes algorithm and kNN algorithm are introduced in detail,and the advantages and disadvantages of the three algorithms are analyzed.(2)Analyze the data of the Douban movie website.A web crawler based on Scrapy was designed concerning the particularity of its website.The functions of the various components of the Scrapy framework are described in detail and the experiment was conducted to obtain data from the Douban film for the experimental analysis.(3)The classical information gain feature selection algorithm does not consider the position distribution of the feature term in the class and the class.Therefore,this paper proposes an information gain algorithm based on interclass concentricity and intraclass dispersion.The DWIG algorithm considering the feature distribution position can effectively sort the importance of the feature.In terms of the irrational distribution of weight caused by IDF in TF-IDF-DW algorithm,a TF-IDF-DW algorithm based on the weight of location distribution is proposed in this paper.Then,the experiment is carried out on the film review data and compared with the classical algorithm.The validity of the proposed algorithm is verified from the three aspects of accuracy rate,recall rate and F value.(4)Put forward the FRRSA evaluation score algorithm based on sentiment analysis,and producing a movie rating more consistent with the user's emotions,since the film comments contradict with the user's scoring emotion.The CRMDM film-scoring model is constructed after comprehensive considerations of evaluation time,the number of users,the user comments and the user rating.Experimental simulation is done based on the data of the film review in Douban with the results showing that the CRMDM model proposed in this paper can help the users to make the decision of watching film.
Keywords/Search Tags:Sentiment analysis, IG algorithm, TF-IDF algorithm, Movie score model
PDF Full Text Request
Related items