Font Size: a A A

Research On Probabilistic Topic Model And Its Application In Multimedia Topic Discovery And Evolution

Posted on:2018-06-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:H K ZhouFull Text:PDF
GTID:1318330518471023Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of information technology,Internet and database technology,the data has been increasing,the problem of information overload is becoming more and more serious.Accurately representing the quantity and characteristics of users' interest in certain topics and tracking these topics' evoluion over time is important problem facing topic evolution researchers,particularly as it applies to the information explosive age.Search engines can provide an effective way for people to quickly retrieve and search useful information from the mass of archived data.However,the search results returned by the search engine are often fragmentary information,which can not reflect the evolution of the whole topic over time.With the emergence of the probabilistic topic models represented by LDA,the research on topic discovery and evolution has sprung up,which provides a good way to solve the problem of hot topics discovery and topics' evolution over time.In the last decade,the research on probability topic model has attracted more and more attention from research area of data mining and knowledge discovery,relevant research findings have been widely applied to text?image and video data processing,these research on topic model has made considerable progress,but there are also some problems.For example,the comparative analysis of the probabilistic topic model is seldom found;In the research of topic discovery and evolution for scientific literature,exploring multiple information sources in the structured data to find the topic and track the topic's evolution is not deep enough;tracing the evolution of the different topics has not yet emerged;Topic model is not perfect in the application of motion pattern discovery and abnormal behavior detection for traffic video.According to these problems,this thesis firstly reviews the different types of probabilistic topic model;on this basis,the paper puts forward a novel topic model based on the combination of words and citations,which are widely existed in the data of scientific literature,and then it is applied to topic discovery and tracking in scientific literature.After that,a new algorithm based on random walk model is proposed to solve the problem of building the evolution map of different topics.Finally,a two layer non-parametric topic model is proposed,which is applied to motion pattern mining and abnormal behavior detection in traffic videos.The main works and research results of this thesis are as follows:(1)This thesis reviews the state-of-the-art research on different types of topic model from multiple aspects.First,according to the characteristics of the model to deal with the time variable,we summarize three categories of topic model-the discrete time topic model,the continuous time topic model,and the online topic model.Next,each of the three types of model's characteristics is summarized respectively,and the typical models of each type are analyzed in detail,which include the modeling process,model characteristics and the advantages and disadvantages of various models.According to contrast experimental problems of topic model,various possible methods of model performance evaluation are analyzed,and two kinds of effective performance evaluation indexes-perplexity and sKL divergency are summarized.The typical models of above-mentioned three types are implemented contrast experiment in two scientific literature corpus.By comparing the experimental results,the analysis of the characteristics of various models is verified.(2)A Citation-Content-LDA topic model is proposed,which builds the modeling of topic discovery and evolution using both document citation relations and the content of the document itself via a probabilistic generative model.The Citation-Content-LDA topic model exploits a two-level structure topic model that includes the citation information for "father topics" and text information for sub-topics,on the basis of this,the topic tracking algorithm is realized,and the model parameters are estimated by a collapsed Gibbs sampling algorithm.The validity and superiority of the proposed model are verified by comparing the experimental results on two typical datasets.(3)A topic evolution algorithm is proposed,which runs in two steps:topic segmentation and topic dependency relation calculation.According to the problem of different topics' evolution,the problem of topic alignment based on the topic found in the Citation-Content-LDA model is solved and using the time information of the topic for topic segmentation;A new algorithm based on random walk for measuring the relationship between topics is proposed,the relationship map between topic is established via a DAG from the thought of PageRank algorithm,and the probability measure of the relationship between the topics is realized by the random walk traversal of the graph,finally the construction algorithm of evolving relationship map between different topics is achieved.Through experiments on two typical data sets of scientific literature,the evolution map of different topics on the two data sets are obtained.(4)A two layer non-parametric topic model is proposed for motion pattern recognition and abnormal behavior detection in traffic video.The non-parametric topic model of the two layer structure can automatically determine the number of topics at each layer,and can extract local topics(visual activities)and global topics(traffic patterns)in the traffic video.A video anomaly detection algorithm based on the likelihood function of the two layer model is proposed,which achieves better results than the existing methods in traffic video abnormal behavior detection.
Keywords/Search Tags:topic evolution, topic models, random walk, motion patten, anomaly detection
PDF Full Text Request
Related items