Font Size: a A A

The Study Of Topic Aggregation Degree Based On CFDP-LDA Model

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:L J QinFull Text:PDF
GTID:2428330611962306Subject:Statistics
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology is conducive to information storage,extraction and dissemination,so that a large amount of information can be converted into electronic documents in the form of semi-structured or unstructured data storage and presentation.How to filter and manage information among massive document information has become the main demand of users.Text mining is developed based on the above demand and has become a research hotspot at present.Text mining is mainly used in data mining,text categorization,text clustering,emotion,public opinion analysis,etc.,according to different requirements in various fields,topic mining is the indispensable important part of the text mining technology.The large half structured and unstructured data that cannot directly carries on the analysis,if we want to get information retrieval,information filtering and sentiment analysis,topic mining is the foundation of text mining.With the crossapplication and development of mathematics,computer language,statistics and other disciplines,text mining technology has been well developed and improved.At present,topic model has excellent statistical characteristics because of its complete three-layer Bayesian generation model,which has been widely concerned and popularized in text mining research.The emergence of topic model greatly improved the accuracy of text topic mining,topic model also has some problems,however,(1)the topic number,number of topics in the traditional topic model is artificial subjective,this method does not have the objectivity,different topic numbers cause great changes data mining results;(2)the problem of topic aggregation degree.In the practical application of the traditional topic model,there are overlaps among themes,which violates the assumption of mutual independence between topics in the topic model,and makes it difficult to summarize and explain the topic information.This paper mainly solves the above two problems,according to the characteristics of topic information data obtained from the topic model,and from the perspective of topic aggregation degree,the topic information is clustered,and the optimal clustering result is taken as the basis for determining the number of topics.Based on literature on the basis of mathematical presents a combination model based on the topic density-CFDP-LDA model,to improve the LDA model,using theoretical framework and ideas of the CFDP(Clustering by fast search and find of density peaks,quick peak density Clustering)algorithm to determine the optimal number of topic.In this paper,on the one hand,from the mathematics reasoning of CFDP clustering algorithm,we theoretically verify CFDP clustering algorithm and the compatibility of LDA theme model,thus the LDA model can be combined with CFDP clustering algorithm for topic digging,providing a kind theory to determine the optimal number of topics and topic information;On the other hand,empirical analysis was carried out on English and Chinese data sets to obtain the topic mining results under the optimal degree of topic aggregation,and the aggregation effect was visualized.Meanwhile,the comparison was made by using statistical index semi-partial R,and it was concluded that the aggregation effect of topic mining based on CFDP-LDA model was better than that of LDA model.
Keywords/Search Tags:Topic mining, Topic model, CFDP-LDA model, Consistency of objectives, aggregation degree
PDF Full Text Request
Related items