Font Size: a A A

Sphere Topic Model Based On Word Embedding In Text Clustering Field

Posted on:2020-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z H XuFull Text:PDF
GTID:2428330596495408Subject:Control engineering
Abstract/Summary:PDF Full Text Request
With the popularity of Internet technology,people use more and more text data inthe process of using smart devices.How to quickly and efficiently obtain the information we care about from the massive text is a hot topic of text mining technology.As an important technology of text mining,text clustering can help users obtain use ful information in text more effectively.As one of the basic techniques of natural language processing,the difficulty of text clustering technology mainly includes the following three points: first,the matching of clustering effect with human perception;second,the degree of interpretability of clustering results;third,how Let the computer get high-level semantic information from natural language text.Based on the above three points,this paper proposes a text clustering method based on word embedding manifold theme model.The main work of this paper is as follows:First,a detailed study and analysis of the commonly used topic models today,pointing out the probs.and cons.of the existing models.Secondly,based on the theoretical analysis results of the first point,the Sephere topic model based on word embedding is introduced,and the semantic structure implicit in the text is mined in the Sphere space to improve the quality of text clustering.Thirdly,the paper analyzes the phenomenon of topic noise in the topic model,and proposes a topic smoothing method based on semantic dependence analysis.The experimental results of Chinese and English corpus show that the three evaluation indicators of the Sphere topic model are better than the existing t wo thematic models,which means that the text is aggregated in the Sphere space.Classes are better able to portray the implicit structure of text.Experiments show that the topic smoothing method can effectively improve the evaluation score of the model,and the conclusion that the Sphere topic model is least affected by the topic noise.While obtained from the experimental data,we can verifying the validity of the Sphere topic model in text clustering tasks.
Keywords/Search Tags:Text Clustering, Topic Model, Topic Noise, Sphere Topic Model
PDF Full Text Request
Related items