Font Size: a A A

Online Belief Propagation Algorithm Research Of Topic Models

Posted on:2014-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y YeFull Text:PDF
GTID:2248330398964887Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Online learning is a learning system which can constantly learn new knowledge fromnew samples, and be able to retain most of previously learned knowledge. In today’s digitalinformation age, the demand of time and space resources continuely grow with the scale ofall walks of life data increases, hence online learning algorithm becomes an urgent demand.At present, the accuracy and speed of online learning algorithms for topic models are notvery ideal, therefore in this thesis we study at huge amount data set and data streaming,propose more efficient online learning algorithm for probability latent semantic analysis(PLSA) and latent Dirichlet analysis (LDA) model. The innovation points are mainlymanifested in the following aspects:1) In face of huge amounts of data and data streaming currently, traditional offlinealgorithms can’t adopt to classify documents for insufficient memory and incompletenessof data set, so online learning algorithm proposes in this thesis firstly split the massive dataset into a set of small segments, and then independently train each segment while use theestimated parameters of previous segments to calculate the gradient descent of the currentsegment.2) Propose online belief propagation (OBP) algorithm based on the improved factorgraph representation for PLSA. PLSA model is a simple method for documentclassification, however, based on huge amount data set and data streaming, PLSA modelcan’t adopt the traditional offline algorithms to solve the problem of documentsclassification, although it has put forward corresponding online learning algorithms, whichstill can’t meet the demands of accurate and speed, hence in this thesis we propose OBPalgorithm based on the improved factor graph representation for PLSA. Four public bigdata sets and three baidu real massive data sets show that OBP is superior tostate-of-the-art OEM algorithm in time and space complexity. 3) Propose online belief propagation (OBP) algorithm based on the improved factorgraph representation for LDA. In PLSA model, the number of parameters is linearlygrowth with the number of documents and words, which lead to the issue of onlinealgorithms for PLSA is very complicated when dealing with huge amount data, therefore inthis thesis, we propose OBP based on the improved factor graph representation for LDA,and prove the convergence of OBP algorithm from theorey, verify the efficiency of thealgorithm from experiments.4) Propose online belief propagation for topic tracking. Aimed at the data is constantlyflowing into in the process of training, which result in the problem of each topic constantlychange meanwhile, this thesis proposes online belief propagation for topic tracking.Through continualy train the data streaming, give the current hottest and the most upsettopics, and more accurately predict the trend of each topic and so on.
Keywords/Search Tags:Topic model, Gibbs sampling, Belief Propagation, online learningalgorithm, topic tracking
PDF Full Text Request
Related items