Font Size: a A A

Research And Implementation Of Topic Model Technology

Posted on:2021-07-17Degree:MasterType:Thesis
Country:ChinaCandidate:D P DengFull Text:PDF
GTID:2518306308970219Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The existing topic modeling methods are difficult to effectively represent and use complex text information,resulting in the inability to ex tract deep semantic information.For short text data,especially,due to semantic dependencies and scarcity of features.As a result,there is no suitable method for topic modeling.Therefore,from the perspective of semantic fusion and semantic expansion,this paper proposed a short text topic modeling method,based on semantic information fusion and semantic expansion,which verifies the rationality of the proposed method through experiments.The specific work and achievements of this paper include:(1)Research on traditional topic modeling methods.This paper summarizes the research status,development trends and current problems of traditional topic modeling methods.In addition,the advantages and disadvantages of the Word2vec,LDA and neural network models and usage scenarios are analyzed in detail.We propose the idea of optimizing the topic model from the perspectives of semantic expansion and semantic fusion on this basis.(2)A model based on semantic information fusion is proposed.The main idea of this method is to re-express the model by adding a priori knowledge in the initialization process of the LDA model,and construct a priori Dirichlet model based on BERT.The experimental results show that,in term of PMI score,show that the B-LDA of the proposed algorithm,is improved by approximately 10.5%and 10.7%on average compared to LDA and DMM.(3)A model based on semantic expansion is proposed.The main idea of this method is to introduce the Word2vec vector representation method in the LDA model and combine the semantic similarity algorithm to achieve the expansion of the topic.The experimental results show that,the PMI score is improved by 16.60%compared with before expansion,confirming the rationality and effectiveness of the model.(4)Designed and implemented a short text classification system,and conducted performance tests.The system is based on W-LDA and B-LDA methods.Including text acquisition,text preprocessing,topic dictionary construction and text classification modules.The simulation results show that,the short text classification algorithm proposed in this paper,in terms of the precision rate,is approximately 4.46%higher than that of the TF-IDF model and approximately 8.7%higher than that of the LSTM method.This paper also makes a horizontal comparison of the algorithms.The results show that,the expanded text has a classification accuracy of 12.48%higher than the text before expansion.It show that the two models proposed in this paper has comparatively higher theoretical and practical value.
Keywords/Search Tags:Topic model, LDA, Semantic fusion, Short text classification, Neural network
PDF Full Text Request
Related items