Font Size: a A A

Research And Implementation Of Emotional Classification Of Microblog Text Based On Topic Relevance

Posted on:2021-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:K L LiuFull Text:PDF
GTID:2428330611962821Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the popularity of Internet social networking,the social platform based on Sina micro-blog has been greatly developed.By the end of 2019,the average daily volume of Sina micro-blog text data was as high as 150 million,which is the highest daily data publication platform for text data.These texts contain a large amount of subjective information.If this subjective information can be effectively analyzed.In order to effectively grasp the trend of public opinion and benefit the government and the public,we can understand the user's emotional tendencies and extract the corresponding views,so many researchers have launched an sentiment analysis of micro-blog texts.For the sentiment classification of micro-blog text,most of the previous methods used coarse-grained two classification with artificial features,which consumed a lot of human resources and the result of classification was relatively simple.At the same time,the user's viewpoint was not further extracted and visualized,making it difficult for users to understand the results of sentiment classification.In addition,micro-blog text was a form of combining topic with text.In many cases,the two are not related.If we classify the emotion directly,it will cost more resources and interfere with the classifier.In view of the above shortcomings,this thesis combines the topic relevance research and implements the micro-blog text sentiment classification system.This thesis mainly includes the following aspects:1.A micro-blog text sentiment classification model based on topic relevance is constructed.The model includes five modules: data acquisition module,data preprocessing module,topic correlation analysis module,micro-blog text fine grain sentiment classification module,and sentiment extraction module.2.A method of micro-blog text batch acquisition without triggering anti reptile mechanism and a data preprocessing method including word segmentation,denoising and word quantization are presented.This method simulates the scene of users browsing micro-blog so that the crawler program can get the corresponding micro-blog text data and store it to the local Mysql database according to the topic.The segmentation result is obtained by using the segmentation tool of micro-blog corpus model,and the result of high-quality segmentation is obtained.Regular expressions are used to identify and remove noise data in the data.Finally,all words are transformed into word vectors by Gensim tool,and the parameters that support the maximum resource of the machine are selected when generating word vectors,so that the optimal word vector can be obtained.As the input of neural network,the classifier is better.3.A method of topic correlation analysis based on classification is presented.For topic relevance analysis,the method first calculates the TF-IDF similarity of micro-blog topic and micro-blog text,and Jaccard similarity and topic word length,and so on,to construct 8 feature sets.Then the feature set is constructed by FeatureTools,and the basic feature of the 8 dimension is constructed into 146-dimensional compound features,and PCA is used to reduce the dimension of the composite feature.Finally,the random forest classifier is used to classify the dimensionality reduction data sets,to get the relevance between the topic and the micro-blog text.4.Design a BI-SRU-Attention neural network classifier based on deep learning.It can capture the bidirectional sequence information in the text and use the attention mechanism of specific granularity to notice the different word vectors of the text.In order to achieve the sentiment classification of topic related micro-blog text under different granularity,and fine-tuning technology is used to deal with the problem of overfitting.For training classifier,the Tensorboard framework is used to visualize the training process in real time,thus intuitively observing whether the classifier training is convergent.And the performance index of the classifier.At the same time,it also realizes the Web visual page that can predict the fine-grained sentiment classification.It can visualize the overall sentiment distribution under various granularity and use the histogram of the switchable granularity and visualize the sentiment classification of each micro-blog.5.A neural network based on sequence annotation is proposed to extract word of opinion.Sequence annotation first regards every word in micro-blog as the target to be classified,and the word labeled as 0 or 1 is taken as the extracted point of view.In order to solve the problem of data imbalance in sequence annotation,this thesis makes use of Focal_Loss is used as a loss function to assign [0.45,0.45,0.1] weights to the words labeled [0,1,2],so that the classifier is more focused on learning the information of the point of view that needs to be extracted.Tensorboard is also used to visualize the training.Finally,visualization of the word points predicted by the new data is carried out using the word cloud tool and the histogram after frequency statistics.To facilitate users to grasp the viewpoint information as a whole.6.Implement and test micro-blog text sentiment classification system.First,introduce the development environment used by the system,mainly including TensorFlow,Keras,and development tool Pycharm,then implement the main modules involved in the system,and implement classifier and function test,respectively.We use AI data to provide tagged Chinese data sets and open English data sets SemEval-14-Resturant for testing,and use the current typical algorithm for comparative experiments,and analyze the experimental results.For functional testing,we mainly use the saved classifier to predict the crawled untagged data and visualize the prediction results at the Web end.The comparison experiments show that the algorithm combined with topic relevance analysis can effectively realize the distinction between micro-blog topic and micro-blog text.At the same time,it can accurately classify the sentiment polarity of micro-blog's text under each particle size under the condition that the topic is related to the text,and extract the opinion words which can represent a large number of users' views on the same topic.The algorithm of topic correlation analysis achieves 90.1% accuracy,which is 4.9% higher than that of the latest TF-IDF-SIM algorithm in the same field.The micro-blog text fine grained sentiment classification algorithm achieves 87.6% accuracy rate,which is higher than the 3% accuracy of the typical classification algorithm.The opinion word extraction algorithm achieves 81.1% and 82.2% F1-score respectively on Chinese and English datasets.The effectiveness of the system is verified by comparison experiments of F1-score.Which is higher than the typical algorithm 1.5% and 1.29% respectively.Through the test of micro-blog text sentiment classification system based on topic relevance and the actual operation of the system,we can see that the system can achieve more accurate and efficient fine-grained sentiment analysis.At the same time,it enables users to understand the overall view of micro-blog users more directly in a certain topic,to quickly grasp public opinion and bring practical benefits to enterprises and the government.
Keywords/Search Tags:microblog text, sentiment classification, deep learning, topic relevance
PDF Full Text Request
Related items