Font Size: a A A

Research On Text Clustering Algorithm And Its Application In Topic Detection

Posted on:2018-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:K ChenFull Text:PDF
GTID:2348330515983276Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age in the world today,a large amount of data information has shown explosive growth.And with the progress of the Internet,the spread of these massive data is also accelerating.Due to the massive growth of Internet users,the network of public opinion to a certain extent,the social public opinion guidance.How to guide correctly,ensure the healthy development of the network,strengthen the supervision and management of network public opinion is facing enormous pressure and challenges.At present,the Topic Detection is a means to understand the network information in a timely manner,which can classify the network information effectively.Topic Detection can be understood as a clustering of events.The core of Topic Detection technology is clustering analysis.Text Clustering is the most commonly used and most important method in Topic Detection.In recent years,researbhers have paid more and more attention to the clustering method based on the finite Gauss mixture model.Among them,the finite Gauss mixture model has been widely studied in various application fields.However,in reality,with the complexity of the data,the probability distribution of many data does not conform to Gauss distribution,so the finite Gauss mixture model can not accurately fit these non Gauss data.At present,the problem of model parameter estimation and model selection is difficult in finite mixture model.Too many or too few choices of model components will cause the problem of overfitting or underfitting.However,the infinite mixture model can avoid the problem of model selection directly by assuming an infinite number of mixed components.Dirichlet mixture model is a nonparametric Bayesian model,which can be understood as an effective clustering method.Therefore,this paper studies on the Topic Detection as the research background,aiming at the existing problems in the finite mixture model,the research use the learning method of infinite Dirichlet mixture model based on non Gauss data modeling,this paper presents an variational approximate inference algorithm.The target data set by a lot of experiments,this paper verified the infinite Dirichlet mixture model based on variational Bayesian algorithm than the finite Dirichlet mixture model with parameter estimation more accurate and faster convergence speed,can well solve the problem of parameter estimation and model selection problems in finite mixture model.At the same time,this paper applies the variational Bayesian algorithm based on the infinite Dirichlet mixture model in Text Clustering,and gets a good result.Then the Topic Detection system is designed and constructed,and the Text Clustering algorithm is applied to Topic Detection.
Keywords/Search Tags:Topic Detection, Text Clustering, Variational Bayesian, Infinite Dirichlet Mixture Model
PDF Full Text Request
Related items