Font Size: a A A

Classification Model And Application Research Of Internet Text Data

Posted on:2021-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y CaoFull Text:PDF
GTID:2428330605950670Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
Social network media has been integrated into people's daily life.Today,they are not only social tools,but also platforms for obtaining,sharing and discussing current hot news and topic events.Because of the strong advantages of social networks in the speed and efficiency of communication,hot events are easy to cause extensive discussion,if there is extreme and false information is over spread,it will not only cause bad influence,but also may make netizens be influenced by emotions,and may have a significant impact on society.Therefore,it is of great importance to correctly identify the emotional categories of online public opinion texts of social hot events and timely understand the emotional tendency of netizens,so as to effectively guide the development of online public opinion and maintain social stability.This paper focuses on the emotional recognition of online public opinion comments.This paper introduces the background knowledge of classification technology,the steps of text classification task and the basic principle of each step,it mainly includes text segmentation,de stop words,feature representation,feature filtering and commonly used text classification algorithms,among them,text classification algorithm mainly introduces the algorithms commonly used in machine learning and deep learning.In the aspect of feature selection,the text classification algorithm introduced in this paper is comprehensively compared based on the commodity review data set of xbao,the convolutional neural network is selected as the basic classifier to compare the three feature selection methods of information gain,mutual information and TF-IDF,and a simple and efficient feature dimensionality reduction method CW is proposed.The experimental results show that this method can achieve similar results with the above methods,and greatly save the time of data preprocessing.furthermore,based on the deep learning classifier,a multi focal loss function is proposed by extending the focal loss function from two classification tasks to multi classification tasks,The improved method has achieved good results in 12 categories of X Fox News data sets,which proves that the improved multi focal loss can solve the problem of difficult classification of some categories in the text classification task.Finally,CW method and multi focal loss are applied to the current hot public opinion “Sino US trade war”,it grabbed the relevant Chinese comment texts of sina Weibo,Bilibili and voice of America,these text data are annotated by OCC emotional rules.The experimental results show that the improved classification method is better than the former,and the multi focal loss can avoid the problems caused by unbalanced data sets.The main innovations of this paper are as follows: first,a simple document capture method CW is proposed,which can greatly reduce the pre training time of text data set.Through experiments,this method is not only suitable for short text data set,but also for long news text data.Secondly,the cross entropy loss and multi focal loss applied to multi classification problems are proposed.It solves the problem of data imbalance in multi class data sets and the problem of difficult classification of some classes.Thirdly,this paper enriches the application research of deep learning Chinese text classification technology in hot news comments,improves the adaptability of deep learning model to emotion recognition of network public opinion,and verifies the excellence of MFL + deep learning classifier in text classification ability.
Keywords/Search Tags:Internet public opinion, text classification, deep learning, Multi Focal loss
PDF Full Text Request
Related items