Font Size: a A A

Improved Algorithm For Topic Detection,topic Trend Analysis And Prediction In Social Network

Posted on:2018-09-01Degree:MasterType:Thesis
Country:ChinaCandidate:G XuFull Text:PDF
GTID:2348330533466783Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with rapid development of social network,millions of users can freely publish and consume content on the Internet.In contrast to traditional media,social network has the advantage of the breadth and speed of communication.In view of this,rearch on the topic detection on the social network becomes more and more popular.So,it makes sense on the field of public opinion analysis and hot news mining by detecting valuable topics.This paper focused on the improvement of text representation model,online topic detection algorithm and extracting trend analysis factors.First,to consider the term frequency feature and relevance between documents and terms,this paper introduced a new text representation model based on modified TF-IDF(Term Frequency-Inverse Document Frequency),and Pointwise Mutual Information(PMI)– PT weight(PMI and TF-IDF weight).This model showed good performance on short text in weibo.Afterwords,this paper proposed a new topic detection model based on NMFPT(Non-negative Matrix Factorization(NMF)based on Pointwise mutual and Tf-idf).Additionally,L2 Regularization factor was also used to avoid overfitting problem caused by data sparseness.According to experiment results,it has been verified that the accuracy of NMPT is better than the original algorithm.Next,to detect topics dynamically updated with time in the document streams,this paper proposed a new model called HNMF_TC(Hierarchical Non-negative Matrix factorization based on Time Window and Cluster Merging.To overcome the disadvantage that original HNMF only considers the number of data points in choosing which clusters can be decomposed,this algorithm applied mNDCG(modified Normalized Discounted Cumulative Gain)to measure the cohesion of each cluster,and used cluster merging method based on the mix similarity algorithm to merge topics set in the neighboring time window.Comparative experiment has verified that the accuracy of HNMF_TC is better than other algorithms.Then,this paper established the goal of topic trend analysis.Through analysis,this paper extracted user factors,weibo factors and time factors that may have impact on the topic trend,especially adding the opinion leaders' influence factor into user factors.Besides,this paper came up with an opinion leaders' influence evaluation method based on modified KED algorithm.This method added the number of common followed people as supplementary factors into the KED algorithm.Afterwords,this method used gradient boosting regression tree as trend prediction model.The validity,accuracy and generality of the algorithm are verified by the real world weibo data set.
Keywords/Search Tags:Topic Detection, Trend Analysis, Hierarchical Non-negative Matrix Factorization, Opinion Leaders' Influence, Gradient Boosting Regression Tree
PDF Full Text Request
Related items