| With the wide application of social media in social life,using the theme modeling method can discover the hidden value of the massive data generated therein.However,the massive inference of social media data makes it difficult of the parameters ihference in topic modeling.In addition,the short text and text flow characteristics of the data make it difficult for the theme evolution model to capture its potential semantics explicitly.Therefore,it is of great practical significance to develop a more efficient topic modeling inference algorithm and a topic evolution model that can effectively deal with the short text data stream.The main work is shown as follows:(1)Summarize and analyze the characteristics of data presentation in social media and the new problems and new challenges encountered by topic modeling methods in social media data;analyze the significance of the research on the parameter inference method of topic model and the topic evolution model of short text data stream for the social media data topic modeling;elaborate the domestic and foreign theme model research status in recent years.(2)Propose a VR-SVI algorithm that can smooth noise gradient and reduce gradient variance based on stochastic variational inference algorithm aiming at how to improve the efficiency of topic model parameter inference method in topic mining of the large-scale data flow.The large noise in the stochastic variational inference algorithm will make stochastic gradients have larger variance,which affects the fast convergence of the algorithm.For this reason,the sliding window method to recalculate noise items in stochastic gradients is adopted,new stochastic gradients is constructed,and the influence of noise on stochastic gradients is reduced.At the same time,it is proved that the proposed algorithm can reduce the variance of stochastic gradient on the basis of stochastic variational inference algorithms,and the influence of the window size on the algorithm and the convergence of the algorithm is analyzed.The experimental results show that the VR-SVI algorithm can reduce the stochastic gradient variance by the sliding window method,achieve rapid convergence,and effectively improve the theme mining efficiency.(3)Propose biterm topic evolution model ST-HDP based on a hierarchical construction method for short text data flow,aiming at how to accurately and effectively analyze the topic evolution of social media data with short text and data flow characteristics.The model uses explicit word co-occurrence methods to reconstruct the documents to avoid data sparseness;based on the Chinese restaurant process,a HCRF method is designed as the constructed process of the topic prior distribution,get a good topic distribution combined with historical information,complete the clustering of topics and documents;design the corresponding MCMC sampling method,and derive the parameters of the model.The experimental results on the Weibo dataset show that the performance of the ST-HDP model is significantly better than the Online HDP,Online BTM and sdTEM models.(4)A topic mining and evolution prototype system for social media is developed,based on the above research results and the software development tool Eclipse,with the social media data as the practical application background.Data collection and preprocessing,topic mining,topic evolution,the view presentation of the topic results data and other functions are designed and implemented.And verify the effectiveness of the topic mining based on the VR-SVI inference algorithm for the large-scale data stream and the topic evolution model ST-HDP for the short text data stream. |