Font Size: a A A

Research On Classification And Topic Evolution Of Blog Based On LDA

Posted on:2020-09-27Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhangFull Text:PDF
GTID:2518306353464464Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,the speed of information dissemination is faster,involving a wider range.Weibo,as an important medium to spread information,has become an important source of hot events and communication.Weibo contains vast amount of excavable information,which not only has huge business opportunities,but also may lead to the crisis of public opinion.In order to extract valuable information from the data and improve the search quality,Weibo needs to be classified effectively.In addition,it is of great significance to analyze the topic evolution of mass Weibo data,to find new topics in time,to track the evolution trend of existing topics,to monitor network public opinion,to filter bad information,and to correctly guide the public opinion of Internet users.The main research results are as follows:(1)In order to solve the problem that the classification effect is not satisfactory due to the sparse features in the classification of blog articles,a novel feature method based on the Latent Dirichlet Allocation model is proposed.Based on the LDA topic modeling of original Weibo,this thesis proposes a new index to judge the quality of the topic by combining the information entropy and the average similarity.At the same time,some feature words of the high quality topic are extended to the original Weibo.Then support vector machine is used to classify the expanded blog.The experimental results show that the high quality topic extraction algorithm of this thesis divides the high quality topic and the noise topic accurately,and the accuracy of Weibo classification is improved significantly after the feature expansion.(2)Based on the characteristics of the evolution of hot topics in Weibo,this thesis puts forward the evolution mode under the condition of topic splitting in the research of topic content evolution,and identifies the generation,division and extinction of topics.In the research of topic intensity evolution,an improved topic intensity measurement method is proposed to solve the problem of the deviation of topic intensity measurement caused by the sparse characteristics of Weibo,which improves the influence of garbage Weibo on the calculation results of topic intensity.
Keywords/Search Tags:Weibo, Latent Dirichlet Allocation model, feature expansion, classification, topic evolution
PDF Full Text Request
Related items