Font Size: a A A

Research On Multi-Level Topic Clustering Based On Cross Degree

Posted on:2018-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2428330518958888Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Identifying and detecting hot topics has always been the focus of scholars' research and the main method of public opinion monitoring.The development of the Internet on the one hand to bring convenience to our lives;the other hand,there are some criminals using the convenience of the network to free spread rumors,caused a bad influence on the stability of society.This paper study on the discovery of hot topics,using the algorithms to cluster scattered news data together,from which to find hot events,and monitor the development and changes of events,and timely to make the appropriate measures.In the event of an earthquake disaster,it is often accompanied by various aspects of work,such as the rescue of the affected people,the prevention of the epidemic,the delivery of the rescue material and the restoration of the infrastructure.All aspects of work are hot topics,by merging all topics to complete description of the earthquake event.Using the traditional topic clustering algorithm to cluster the event,the results may be all aspects of the news are all clustered into a topic,only a general report on the earthquake,clustering results are not ideal.Topic clustering not only to reflect the specific branch of the topic,but also to reflect the branch of the topic is the integrity of the event.This paper proposes multi-level topic clustering,that is,in the original topic(first-level topic)on the basis of re-clustering.Firstly,in order to solve the problem of dimension explosion in the topic model,the dynamic weight method is proposed,which dynamically changes the weight of the characteristic word until it is lower than the threshold.The experiment proves that the method can effectively reduce the dimension of the topic model while maintaining the correct rate.Secondly,the improved single-pass algorithm is used to cluster the data sets first,and then the sub-topics are obtained.Thirdly,introduce cross degree to calculate the similarity between the topics.Any two topic classes can use the cross degree algorithm to calculate the similarity value to determine whether two topics can be merged.Finally,the similar sub-topics are clustered together using the multi-level topic clustering based on the cross degree,to find the relationship between sub-topics.The experimental results show that the algorithm proposed in this paper is effective,Experiments show that the vector dimension is obviously reduced after using the dynamic weight algorithm,and the similarity calculation based on topic cross degree is more accurate and the result of topic clustering is more realistic.
Keywords/Search Tags:topic discovery, text clustering, single-pass algorithm, vector space model
PDF Full Text Request
Related items