Font Size: a A A

Applications Of Convex Analysis To Probabilistic Graphical Models

Posted on:2018-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:T C HuFull Text:PDF
GTID:1310330512983407Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The graphical model or probability graphical model(PGM)is a powerful tool combining probability theory and graph theory,and characterizes a system by conditional probabilities that represent local direct correlations.For it is intuitive and easy to understand,the probability graph is widely used in data processing.However,with the increase of probability nodes,the overall analysis of the system becomes difficult,and the reasoning is complicated and time-consuming.In this thesis,we take convex analysis as a tool under the regularization framework to study the graph-ical models under three kinds of scenarios that are supervised learning,semi-supervised learning and unsupervised learning.Based on the regularization framework,formalized objective functions are simple,therefore we can derive efficient inference methods.And the relationship among su-pervised learning,semi-supervised learning and unsupervised learning evolutes naturally and can be unified in the framework provided in the thesis.Moreover,solving the optimization problem of the regularization method with the online algorithm,can well balance the tradeoff between the computational overhead and the generalization performance of the related graphical model.In the thesis we first study multi-classification,the kernel element of our Bayesian classifier assigns a probability for an input sample,which is used to connect the output label and the input feature.Based on the analysis of the MLE(Maximum Likelihood Estimation)-liked optimization problem,we propose efficient learning and prediction methods with low computational overhead.The empirical stud)also shows the Bayesian classifier outperforms many classical classification methods in prediction accuracy and generalization ability.The proposed Bayesian classifier has the advantages of easy to understand.low level computational complexity,accurate prediction and so on.We propose two methods for semi-supervised learning.As the Bayesian classifier uses an on-line algorithm for learning,it converges fast,and has nice generalization performance,we take the classifier to predict a label for the unlabeled data,and combine the two as a new sample for learn-ing.The method typically catches the trade-off between exploration and exploitation.While in the second method,we analysis the objective function of the Bayesian classifier,which corresponds to a constrained optimization problem,where the LogSumExp function takes the objective function position and the labeled is on the constraint position,it is natural to derive an unlabeled data pro-cessing framework using the LogSumExp function,which is used to measure the consistency of the model parameter and the feature data.The LogSumExp framework could cover k-means method),the difference is that the LogSumExp takes product operation to coordinate the model parameter and the unlabeled data,while the k-means method uses L2 distance as input.Note that even we introduce the LogSumExp for semi-supervised learning,but we would like to say the LogSumExp could also be used in unsupervised learning situation.LDA(Latent Dirichlet Allocation)and HDP(Hierarchical Dirichlet Processes)are classical unsupervised models of natural language processing.There are a large number of local nodes for each text of the corpus by its graphical model description,which need to be optimized;These nodes are given in the local conditional dependence form,lacking of a simple and clear formal definition of global optimization problem,which makes it is difficult to validate the rationality.In the unsupervised learning section.we would build regularized optimization problem views for the LDA and the HDP model.The main theoretical contributions of this work include:1.The LogSumExp framework for unlabeled data processing.We analysis the objective func-tion of the Bayesian classifier processing the labeled data,which has a form of objective function for the Fenchel conjugate function definition;the LogSumExp is extracted as it takes the position of objective function based on the connection between the Fenchel con-jugate definition and the constrained optimization problem.The LogSumExp is used as the basis for unlabeled data processing,the derivative of the LogSumExp function gives each topic with a different weight,guiding the learning of unlabeled data.2.The reduction principle is derived by applying the duality principle of convex analysis to the regularized optimization problem.The regularized optimization problem with only one variable is equivalent to a dual problem involving multiple variables,and the correspondence between variables of the two optimization problems is established.Based on the reduction principle,we recognize that the variables associated with the sequence in a graphical model correspond to gradients of the optimization problem with only one variable(N VS.1).The reduction principle is the basis for establishing the regularized optimization problem view of the graphical model,which also gives us the task "define a regularized optimization problem where the gradient of the data term should be consistent with the graphical model".3.Definition of a conjugate function corresponds to an optimization problem,while we de-fine a map from the conjugate function input(we take this as parameter of the optimization problem)to solution of the optimization problem.Based on the Fenchel inequality and con-jugate bijection property,if the function is strictly convex,we would establish a bijection view from optimization problem parameter to its solution,in which the gradients of these two conjugate functions play import roles,we call this bijection property the gradient duality principle,which is the theoretical foundation for defining the regularizer of the regularized optimization problem.4.Probability space and its dual space.Probability simplex is an important component in the graphical model,and it is necessary to deal with the transformations of parameters and gra-dients in the regularized optimization problem(one of the two is the gradient duality of the other one).By Analyzing the domain structures of the two.the gradient duality princi-ple works for conjugate functions<P ? probability simplex.log(P)>and LogSumExp.even though they are not strictly convex.Probability space and its dual space are defined by the con.jugate functions.Moreover,by analyzing the online algorithm of the LDA regularized optimization problem,the additive algebraic system is deduced for the probability space.
Keywords/Search Tags:Probabilistic Graph Model, Regularization, Convex Analysis, Reduction Principle, Gradient Duality, Probability Space
PDF Full Text Request
Related items