Font Size: a A A

Topic Model Based Multimedia Question Answering

Posted on:2019-10-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:X M WangFull Text:PDF
GTID:1368330575969849Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of social media,there are more and more picture and video sharing websites such as Flickr and YouTube.Meanwhile,users have also been accustomed to publishing tweets and achieving hot events news via social plat-forms and news media websites such as Twitter and Sina Weibo,ABCNews,BBCNews.With the rapid growth of text,images and videos on the Internet,there will be a"dat,a disaster" problem for users.Therefore,how to effectively manage such a large amount of data and mining useful information to the users is an urgent problem need to be solved.And,On the other hand,how to use the multimedia data of social media to provide users with more information is a current research trend.Furthermore,as question answering is hot topic,how to provide a friendly question answering int,erface to users is also a problem to be dealt with.According to the mention above,the work of this paper is to study the multimedia question answering based on topic model,by using the topic models,we mining valuable information from the collected large-scale news data,meanwhile,we combine both the text and multimedia data to provide uses more comprehensive and rich information.Finally,the information are presented with a friendly user interface.On the whole,the significant work and contribution of this thesis can be summarized as the following:(1)News summarization in searchSince more and more news data are generated from the current news websites,users are overwhelmed.In this thesis,we introduce the popular topic model to the approach of news summarization.We propose a framework of multimedia news summarization with search.We use the widely used hierarchical Local Dirichlet Allocate for the topic analysis of news data collection,thus we will obtain several topics,which could facilitate users to quickly browse news content.In our proposed framework,we develop a whole process of text cleaning,topic detection and representative news document and image of each topic.(2)Topic model based on biterms and imagesIt is an important research topic to infer topics from the massive growth of data on social media sites such as Sina Weibo.In this thesis,we explore the topic models on Sina Weibo data in order to obtain a better topic detection model.The conventional topic models learn topics based on word co-occurrence within document,however,the performance is not good as the text on Sina Weibo are in the form of short text where the word co-occurrence become very sparse in each document.Thus,in order to tackle the sparse problem,we learn topic models via biterms.There are not noly text data,but also images and other data associated with the text on Sina Weibo,therefore,we propose a comprehensive model that uses text and image information,abbreviation of IBTM.This model improve the ability to detect topics with a combination of biterms and images.(3)Multimedia question answering based-on multiple-source informationAt present,there are not only text data,but also the multimedia data such as images associated with news.How to use these data effectively is a problem worthy of study.In this thesis,a multimedia question answering method based on multiple-source information is proposed.In this process,we first analyze the problem of news query submitted by the user,and then use the improved weighted BM25 method to return the news data related to the query.Then by using the classic Latent Dirichlet Allocation topic model to detect the topics from the returned related news collection,and via our proposed rule we select one representative image for each topic,this representative image can be regard as a supplement to the text in the visual aspect.
Keywords/Search Tags:Topic model, Social media, News summarization, Multimedia QA
PDF Full Text Request
Related items