Font Size: a A A

Research On Weibo Emotion Classification Algorithm

Posted on:2020-11-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y MaFull Text:PDF
GTID:2428330599953770Subject:Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,people often use the news published on various social networking sites to understand the hot events,public opinion and so on.With the rapid development of technology,Weibo has gradually become popular,and more and more people are beginning to pay attention to current events through Weibo,making Weibo the most popular social platform.On Weibo,people often publish propaganda microblog texts about emotions or opinions,and this kind of emotion classification has potential value for e-commerce,information prediction,etc.Therefore,it is important to classify Weibo emotions.The selection of feature items and the calculation of feature weights are two important links in the text classification process,which play a key role in the results of text classification.In order to overcome the negative correlation between the frequency of feature items and the category in the traditional CHI statistical method and the probability problem that a feature item exists in a text,this paper introduces a negative correlation decision for the traditional CHI statistical method.The frequency and other important factors are improved,and the TF-IDF algorithm is optimized with the calculation method of semantic similarity.KNN(K-Nearest Neighbor)classifiers and support vector machine(SVM)classifiers are used in WEKA software to classify Weibo emotional corpus respectively.the experimental results show that the accuracy of text classification is improved obviously by the new method.For the KNN classification algorithm used in the above experiments,in order to reduce the time complexity of the traditional K-nearest neighbor classification algorithm,which is proportional to the number of training sample sets,resulting in a large amount of computation and a waste of time.In this paper,the training sample set is clipped by K-medoids clustering algorithm,and the samples with low similarity are removed,and then the traditional KNN classification algorithm is combined with the MapReduce framework of Hadoop platform.Based on the improved KNN classification algorithm based on K-medoids and the algorithm proposed in this paper,the parallel calculation of unequal test sample sets is carried out in terms of time.The experimental results show that the proposed algorithm is 68% ? 82% shorter than the traditional KNN classification algorithm,and the running time will decrease with the increase of the number of nodes.The improved KNN classification algorithm based on Hadoop platform obviously shortensthe calculation time and improves the classification efficiency of the algorithm.
Keywords/Search Tags:Sentiment classification, Feature extraction, Feature weight, CHI statistics, K-nearest neighbor
PDF Full Text Request
Related items