Font Size: a A A

The Research On Text Clustering Of Gaussian LDA Model Based On Statistic Learning Methods

Posted on:2018-07-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2348330536972676Subject:Statistics
Abstract/Summary:PDF Full Text Request
Human beings is facing by the expansion of network information.The arising question is how to get quickly and efficiently the information from unstructured massive documents.The later question is one of the research focus in text mining and the text clustering is the basic technology in this area.The pitfalls in text clustering is mainly three points:Results clustering of query-response matching,clustering the results of the description of labels,the accuracy of clustering resultsBased on the above three points,this paper proposes a text clustering method based on statistical learning,specifically Gaussian LDAmodel.The main work of this paper:Firstly,the paper analyzes the technologies in China and abroad,many method,statistical based model and clustering based recognition method.Secondly,the word vector model is introduced in the form of text extension,the prior information of the LDA model is improved,the Gaussian LDA model is introduced,the potential of knowledge-based features is integrated into the word vector space,the text internal semantic knowledge is extracted and the text clustering quality,and implementation of the Gaussian LDA model in Python.Thirdly,a method based on LDA model clustering and the result evaluation method is proposed by using LDA model to generate potential features based on word models and word sets,a combined model with text probability distribution and word vector model.Compared with the traditional LDA thematic model,the clustering quality has been greatly improved,Gaussian LDA based on the statistical method.The model of text clustering method is effective and reasonable.
Keywords/Search Tags:Text Clustering, Topic Model, Term, VectorGaussian LDA Model
PDF Full Text Request
Related items