Font Size: a A A

The Application Research Of CTM Topic Model In Subject Subject Recognition And Subject Document Classification

Posted on:2020-12-18Degree:MasterType:Thesis
Country:ChinaCandidate:S N ShiFull Text:PDF
GTID:2438330572971279Subject:Library science
Abstract/Summary:PDF Full Text Request
The rapid advancement of science and technology has promoted the exchanges and cooperation between disciplines and led to the development of various disciplines.As the carrier of subject knowledge,the subject literature has a rapid growth trend.In order to quickly detect the research trends of the subject from the complicated literature and accurately retrieve the target documents,it is necessary to get the support of text data analysis technology.In recent years,the wide application of machine learning has made topic models attract the attention of researchers.The Correlated Topic Model(CTM)can not only mine the semantic information of the topic granularity,but also use the theme to reduce the dimension of the text data.Compared with another excellent model in the topic model family,the Latent Dirichlet Allocation(LDA),the CTM captures the correlation between topics and topics,and has rich text representation capabilities.The model occupies a place in the fields of natural language processing,data mining and artificial intelligence.In addition,the model can not only process text data,but also apply to entities such as image data and voice data.Based on the existing research,this paper focuses on how to use the CTM to solve the problem of literature classification and subject identification.The details are as follows: The first two chapters mainly summarize the development of topic models at home and abroad,as well as the research on subject identification and subject literature classification,pointing out the shortcomings in the research.The text mining process and detailed steps of each process are introduced and the document generation process,posterior inference and parameter estimation of CTM are explained in detail.The third chapter explores the advantages of the CTM in subject identification.By using the CTM to identify the two cross-subjects,the topic strength distribution is used to calculate the topic strength and calculate the topic similarity of each time period,and the evolution process of the cross-subject is dynamically displayed.Experiments have shown that the CTM can more comprehensively identify the subject of interdisciplinary subjects.In the fourth chapter,based on the CTM,the subject recognition ability of the subject is proposed.The C-KNN classification method combining the model with the KNN classification algorithm is proposed.The subject information is included in the classification of the subject literature,which not only retains the subject information and reduces the corpus dimension,but also solves the problem that the traditional KNN classification algorithm has a large computational complexity and does not consider semantic information when calculating text similarity.By comparing with the traditional KNN classification algorithm and the KNN classification algorithm based on LDA,it is proved that the C-KNN classification algorithm has a better effect on multidisciplinary literature classification.The last chapter summarizes the content of the full-text research,points out the shortcomings in the research and looks forward to a better research.
Keywords/Search Tags:CTM, Subject identification, Literature classification
PDF Full Text Request
Related items