Font Size: a A A

Topic Extraction And Topic Evolution Of Social Network

Posted on:2018-04-25Degree:MasterType:Thesis
Country:ChinaCandidate:H W GuanFull Text:PDF
GTID:2348330569986455Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information network,social media as its media,with a large number of user groups,the rapid spread of hot events show great social influence.The vast majority of massive amounts of information appearing in social networks are in textual form,and this order of magnitude can not be used to extract hot topics and keywords through manual processing.Therefore,how to use the algorithm to get more accurate and fast from a large number of data to obtain the data information we want,has become a the problem to solve in topic evolution analysis.Topic evolution means a topic from generation to maturity,until the decline of the entire life cycle experienced by the time axis.It is a dynamic change process.After studying the current mainstream hot topic extraction and evolution analysis method,this paper extracts the topics in the social network from the content and analyzes the evolution of the topic keywords on the time axis.The main work is as follows:Firstly,aiming at the shortcomings of the current text clustering algorithm model and the need of topic extraction,a distance metric method based on word co-occurrence model is proposed.We use the word distance method to cluster,and the extraction of topics shows a better result.At the same time,for the characteristics of micro-blog data and the requirements of co-occurrence term model,we need to pre-process the original micro-blog data and calculate the word distance.The preprocessing includes Chinese word segmentation,filter stop words and special part of speech,word distance calculation is to use distance to express the correlation between the keywords.Secondly,based on the present advantages and disadvantages of the existing text clustering methods and the defect of fast search and find the density peaks algorithm,a hot topic extraction algorithm based on word distance is proposed,and the superiority of the extraction algorithm is proved by the evaluation index.Thirdly,according to the shortcomings of the topic evolution model,we proposed to evaluate the relationship between the two topics by inclusion degree.At the same time,the independence of words is used to express the importance of words.You can very intuitive to see the evolution of the topic over time.Fourth,we use Matlab,Gephi and other tools to achieve a hot topic extraction and evolution analysis system.The system has done most of the work of this thesis,and it has practical significance to study the topic extraction and evolution analysis.
Keywords/Search Tags:Topic extraction, text clustering, topic evolution
PDF Full Text Request
Related items