Font Size: a A A

Research And Improvement Of Chinese Short Text Sentiment Analysis Feature Selection Algorithm

Posted on:2020-12-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q ShenFull Text:PDF
GTID:2428330605966650Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of network technology and the arrival of the information age,a huge amount of text information is growing rapidly on the Internet.Weibo,We Chat and other social platforms run through the corners of people's daily lives.Short text as the mainstream form are exploding all the time on the Internet.These textual information contain public opinion orientation and user perspectives,so effective sentiment analysis of these short texts and mining their potential information have enormous social and economic benefits.In order to achieve the sentiment analysis of short texts on the Internet for massive data,researchers use automated methods such as machine learning and deep learning to adapt to the huge amount of data.The feature selection algorithm is an important step in the sentiment analysis method based on machine learning.The existing feature selection algorithms have shortcomings and defects in Chinese short text sentiment analysis.In this paper,we researched on the shortcomings of term frequency-inverse document frequency algorithm(TF-IDF),Information Gain(IG)and Mutual Information(MI)and proposed improvements.The traditional TF-IDF algorithm has the disadvantage of ignoring the distribution information between feature classes.The traditional information gain algorithm has the disadvantage of ignoring word frequency information.In view of the above deficiencies,this paper optimizes and improves the traditional TF-IDF algorithm and information gain algorithm by introducing the information of class feature distribution and word frequency information.The improved algorithm can better distinguish the differences between features,improve the feature selection ability,and ultimately improve the performance of the algorithm.The traditional mutual information algorithms have problems such as word frequency information,category relevance,and excessive weight given to low frequency words.In view of the above problems,this paper optimizes and improves the shortcomings of the algorithm and the defects in the sentiment analysis by considering the aspects of word frequency information and document frequency information.The improved algorithm can better overcome the above deficiencies,better screen out the feature words with more classification meaning,effectively filter the interference features,and improve the classification performance of the algorithm.The experimental results show that the improved algorithms show better classification performance in the Chinese short text sentiment analysis than the traditional algorithms and related improved algorithms.
Keywords/Search Tags:Short Text, Sentiment Analysis, Feature Selection, TF-IDF, Information Gain, Mutual Information
PDF Full Text Request
Related items