Font Size: a A A

A Topic Model Based On Community Structure

Posted on:2018-08-14Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaoFull Text:PDF
GTID:2348330512998648Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
In recent years,with the vigorous development of the Internet technology,a large number of online social platforms have appeared.Massive amounts of data can be generated by communicating and interacting among people on these social platforms.The formation of social networks seems to have become a data source of important value for public opinion and information judgment.For such data,the theme model is a very effective way to mining text information.While the social data usually have the characteristics of short text,if the traditional theme modeling algorithm is used directly,the resulting theme effect is not so good.After analysis,we know that the traditional theme model only considers the content of the text data itself and ignores the social network within the data.Based on the study of LDA theme model,short text expansion and related community detection algorithm,this paper proposes a subject model on the data set that contains the social network.In this model,after community detection and division for the social data set,text merging and expansion generate the new long text,which replaces the original data for theme modeling.The improved algorithm can alleviate data sparse problems effectively and also make full use of the social attributes of the data to enhance the effect of the model on the theme mining.The main work is as follows:1.A short text expansion scheme based on community detection is mentioned.According to the social network included in social data,we detect and divide the community based on label propagation,spectrum analysis,exploration strategy,then employs the traditional short text processing methods(such as splicing,etc.)to deal with the text,which generates a long text with rich vocabulary to theme mining.2.Focusing on the scenario,the topic model based on the spectral analysis community detection algorithm is optimized:in view of the slow convergence rate of the Potts model algorithm based on exploratory strategy in the calculation process,and the convergence to the local minimum point possibly,this paper introduces the Potts model for secondary cooling.The iterative process is divided into two stages:The high-temperature stage adopts the global perturbation method and the low-temperature stage uses the finite perturbation to make the model converge faster,applying the temperature-return strategy,which makes the algorithm more likely to jump out of the local optimal solution.The results show that the effect of the algorithm is superior to those of the direct convergence.3.Aiming at this,great progress is made in the topic model based on the label propagation community detection algorithm:considering that the nodes in the social network have different influence degree,and some of the core nodes may have greater impact on multiple community networks,the improved COLPA is proposed,which is based on COPRA.The improved algorithm adopts the new updating strategy,introduces the tag attenuation factor and the adjacent node influence factor to control,then proposes a new termination strategy,which makes the model converge faster.The experimental results illustrate that using the improved algorithm clustered data mining produces higher quality than the theme model LPA generated.
Keywords/Search Tags:Short Text, Topic Model, Social Network, Community Detection
PDF Full Text Request
Related items