Font Size: a A A

Network Hot Topic Discovery Based On Topic Model And Clustering Algorithm

Posted on:2020-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:J DongFull Text:PDF
GTID:2428330596985415Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The network complexity caused by the rapid development of social media has brought great challenges to the research of the network.The public opinion problems emerge one after another in the network.Users can express their opinions and opinions anytime and anywhere.This has accumulated a large number of data generated by users,such as pictures,texts,videos and so on.These data hide valuable information,which reflects the current situation.All kinds of social problems occur.However,the complexity of the network makes it impossible for people to get information in time and effectively.How to accurately find hot topics in the network has become the focus of scholars' research.This paper collects Sina Web data,through in-depth analysis of Web text content,user forwarding,comments and other information as well as related attributes of user characteristics,the main work is as follows:(1)In view of the fact that the traditional word-to-topic model takes the same treatment for all words in short text processing,ignoring the user personalization problem,a topic feature extraction algorithm based on the word-to-topic model is proposed.Firstly,user factors are introduced into topic modeling,and all text produced by the same user is regarded as a document;secondly,background words and topic words are considered,irrelevant background words are deleted,and Gibbs sampling is introduced to derive model parameters;finally,JS and cosine similarity are used to jointly determine whether a classification is a classification,so as to ensure the accuracy of feature extraction.(2)For the firefly algorithm,it is easy to fall into the local optimum.In the iterative process,the firefly algorithm is easy to skip into the optimal solution.In order to solve these problems,a dynamic adaptive step firefly algorithm is proposed.In the initial stage of the iteration,the larger step is located at a faster speed in the range close to the global optimal solution.In the later stage of the iteration,the small step is optimized near the optimal solution,thereby it can enhance the optimization ability for the algorithm.(3)Fuzzy clustering algorithm(FCM)is sensitive to the initial center point,and the clustering of distance alone can not accurately find the topic of hot topic on the network.Aiming at this problem,we propose a fuzzy clustering method based on improved firefly algorithm.In this process,topic influence is regarded as the mutual attraction of fireflies,and the relationship between fireflies is established based on text similarity.It isapplied to FCM to improve the fitness function optimization,and then FCM is used to cluster after the clustering center is obtained.The topics obtained by clustering are sorted according to the influence value,so that hot topics with high accuracy can be obtained.The real data of Sina Web are collected,and the above work is simulated.The performance of each algorithm under different conditions is compared.The experimental results show that the performance of the proposed algorithm is better than other algorithms.
Keywords/Search Tags:hot topic, firefly algorithm, word-to-topic model, fuzzy clustering, dynamic adaptive step size
PDF Full Text Request
Related items