The Research On Text Clustering Of Gaussian LDA Model Based On Statistic Learning Methods

Posted on:2018-07-29

Degree:Master

Type:Thesis

Country:China

Candidate:Y Wang

Full Text:PDF

GTID:2348330536972676

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

Human beings is facing by the expansion of network information.The arising question is how to get quickly and efficiently the information from unstructured massive documents.The later question is one of the research focus in text mining and the text clustering is the basic technology in this area.The pitfalls in text clustering is mainly three points:Results clustering of query-response matching,clustering the results of the description of labels,the accuracy of clustering resultsBased on the above three points,this paper proposes a text clustering method based on statistical learning,specifically Gaussian LDAmodel.The main work of this paper:Firstly,the paper analyzes the technologies in China and abroad,many method,statistical based model and clustering based recognition method.Secondly,the word vector model is introduced in the form of text extension,the prior information of the LDA model is improved,the Gaussian LDA model is introduced,the potential of knowledge-based features is integrated into the word vector space,the text internal semantic knowledge is extracted and the text clustering quality,and implementation of the Gaussian LDA model in Python.Thirdly,a method based on LDA model clustering and the result evaluation method is proposed by using LDA model to generate potential features based on word models and word sets,a combined model with text probability distribution and word vector model.Compared with the traditional LDA thematic model,the clustering quality has been greatly improved,Gaussian LDA based on the statistical method.The model of text clustering method is effective and reasonable.

Keywords/Search Tags:

Text Clustering, Topic Model, Term, VectorGaussian LDA Model

PDF Full Text Request

Related items

1	Sphere Topic Model Based On Word Embedding In Text Clustering Field
2	A Biterm Pseudo Document Topic Model For Short Text
3	Research On Topic Clustering Algorithm Based On Topic Models
4	Research And Implementation Of Distributed Topic Clustering Technology For Text Flow
5	Research On Short Text Topic Discovery Based On BTM Topic Model
6	Event Detection From Microblogs Based On Topic Model
7	Research On Text Clustering Algorithm And Its Application In Topic Detection
8	Reasearch On The Topic Clustering Of Network Short Text
9	Research On Bilingual Text Clustering Based On Semantic Duality Model
10	Topic Evolution Analysis On Complaint Traffic Data