Font Size: a A A

Research Of The Micro-blog Hot Topic Detectionbased On VSM-BTM Topic Model

Posted on:2018-09-05Degree:MasterType:Thesis
Country:ChinaCandidate:X F ZhangFull Text:PDF
GTID:2348330536473555Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,micro-blog has received wide attention from all sectors of society as a kind of social media.But how to extract effective information from massive and irregular micro-blog efficiently in for topic discovery is still a problem to be solved.Therefore,the method of mining micro-blog data using topic model is produced.At present,researchers have been done a lot of research on the topic model,but there are still some problems in the existing technology,which are mainly like as: firstly,the computational complexity is too high and make the low efficiency of data for micro-blog big data calculation of level;Secondly,the use of topic models(such as the traditional LDA topic model)to micro-blog this passage to cluster data accuracy is not high.Based on this,the paper presents a VSM-BTM model,and improve K-means clustering method for micro-blog data of sina micro-blog data mining method,calculation of micro-blog data while make the efficiency same,to improve the accuracy of micro-blog data mining.The paper studied the micro-blog data mining method based on VSM-BTM topic model.The research process is divided into three parts: the pretreatment of micro-blog data,VSM-BTM modeling and clustering.Among them,the pretreatment of micro-blog data includes word segmentation: one is to delete the stop-word,the other one is to delete noise data,and save the results in text version.VSM-BTM topic model is based on the data preprocessing in the use of micro-blog BTM topic model,iterating on the results of data processing,"the document topic" matrix and "theme words" matrix.At the same time,using two distance calculation(JS distance and cosine distance)formula to calculate the similarity,and then use the improved K-Means clustering method to analysis the modeling results,which improve how to choose the initial clusters and the traditional compute method to calculate K-Means clustering.In the end,we use the accuracy rate,recall rate and F1 value to evaluate the experimental results.The method of using VSM-BTM topic model to avoid the sparsity of micro-blog data,and didn't have to use expand the use of external information on micro-blog data,which can reduce the dependence of information outside the text.Finally,the paper analysis the LDA topic model? BTM topic model and VSM-BTM topic model to deal with how to find hot weibo topic,which based on the accuracy rate,recall rate and F1 value of the three topic models.Actually,it find that the VSM-BTM topic model is better than the LDA topic model and BTM topic model on finding the hot topic of weibo data,which proved the validity of the model used in this thesis topic modeling through the experimental comparison analysis.The accuracy is better than other existing two micro-blog data mining method while the computational complexity does not increase.
Keywords/Search Tags:micro-blog, topic detection, VSM-BTM model, clustering
PDF Full Text Request
Related items