Font Size: a A A

Micro-Blog Hot Topic Discovery Method Based On Topic Model

Posted on:2019-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:L T WangFull Text:PDF
GTID:2428330602952255Subject:Information Science
Abstract/Summary:PDF Full Text Request
Since entering WEB 2.0,social media has brought more and more joy to people's lives.People's lives are inseparable from the mobile Internet.At the same time,the social media represented by Micro-blog has become an important channel for public to get current political information,discuss social hot topics,learn knowledges and communicate with each other.Micro-blog is popular with people for its convenience and simpleness.At the same time,people can post micro-blog information by various terminals.Some netizens share their own feelings and living conditions,and others propose their opinions about some hot topics at that time,which made micro-blog platform produce all kinds of chaotic data information.Facing the complex information of various structures,it is unrealistic to find hot topics by manpower.Especially when the era of big data comes,the volume of data,the updating speed of data and the diversity and authenticity of data have brought some challenges to data mining.Therefore,it is a valuable research project to study the hot topic discovery and evolution of micro-blog.It has always been the research goals for related scholars that how to discover the hot topics of public discussions quickly and accurately in the whole micro-blog space and explore the evolution laws behind these topics.We find the following deficiencies by referring to relevant literatures:First,in previous studies the feature selection stage before modeling does not take into account the characteristics of microblog,and the extracted features are not accurate enough,which affects the efficiency of topic discovery.Second,in the evolution of hot topics,there is no model that combines the characteristics of micro-blog's topic label to discover the evolution rules of the topics in real time,at the same time there is no visualization of the evolution of topics.Based on the above studies,the following improvements have been made in this paper.Firstly,a micro-blog hot topic discovery framework that combines the sociality of micro-blog has been proposed in this paper,the framework including the following parts,data preprocessing,text representation,and hot topic discovery.First we extract meaningful words through data preprocessing.Then we consider the sociality of micro-blog during the text representation phase,and draw on the idea of H-index to propose the term H-index to filter feature words,then we select features through the term H-index.The feature words selected by the term H index are hot words,which improves the accuracy of modeling and reduces the modeling dimensions.Then feature words are modeled by VSM and BTM respectively,and microblogs are expressed as the "document-word" vectors and the "document-topic" vectors.We make up for the shortcomings of short text faced with feature sparseness through the semantic information inside the text.During the hot topic discovery stage,the hot topics were obtained through the K-Means clustering algorithm.Then the evaluation criteria and comparative experiments were designed.We finally used experiments to verify the effectiveness of the proposed model.Secondly,Proposing a hot topic evolution model of micro-blog—Label On-line Latent Dirichlet Allocation(LOLDA).By combining OLDA(On-line Latent Dirichlet Allocation)can automatically track the evolution of hot topics and the characteristics of micro-blog 's unique topic label,we propose a hot topic evolution model which is suitable for micro-blog and show its generation process and parameter estimation.Finally,we design experiment verifies that the proposed model has better generalization ability than traditional model.The specific process is:crawling the official Sina Weibo data through Python,preprocessing the original data,modeling using the LOLDA model after extracting features,then analyzing the evolution laws of the hot topics of the micro-blog data.
Keywords/Search Tags:Topic Model, Micro-blog, Hot Topic Discovery, Topic Evolution, Topic Label, H Index
PDF Full Text Request
Related items