Font Size: a A A

Research On Hierarchical Topic Modeling Method For Multi-Document Summarization

Posted on:2015-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:W HengFull Text:PDF
GTID:2298330467963809Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-document summarization provides an effective and efficient way for people to gain and archive large amount of information. Topic modeling has always been the most popular algorithm for it. However, most researchers in the world adopt the flat structure topic modeling algorithms. The specific requirement of multi-document summarization for topic modeling, such as coverage, aspect points, generalization and details makes hierarchical modeling a good choice.To learn topic hierarchies from data, Blei proposed hierarchical Latent Dirichlet Allocation (hLDA), which has been proved to be a powerful tool. However, one of the bottlenecks which prevent its large-scale application is that we cannot find a quick and effective approach to model new data properly. There exist a lot of factors, such as hyper-parameter settings, uncertainty of random algorithms and specific features of different corpus.We propose a unified framework for analyzing key factors of applying hLDA to practical hierarchical topic modeling tasks. This framework is mainly organized by two analyzing clues, which are Bayes clue and range clue. We mainly lay emphasis on the three aspects of features of models used by hLDA and prior selection by multi-document tasks. Then we give a series of practical and effective empirical modeling strategies and processes, and finally evaluate the modeling results with experiments using the corpus from JACM and the multi-document summarization corpus from ACL MultiLing2013and others.This work was supported in part by the National Science Foundation of China(NSFC) under Grants61202247and71231002.
Keywords/Search Tags:hierarchical latent Dirichlet allocation, unifying modelinganalysis framework, Bayesian theory clue, empirical evaluation method
PDF Full Text Request
Related items