Storyline Extraction In News Articles

Posted on:2018-08-17

Degree:Master

Type:Thesis

Country:China

Candidate:H Y Xu

Full Text:PDF

GTID:2348330542951666

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of online news media sites and news APPs,tremendous news reports are generated,which has been a main way of obtaining and paying attention to domestic and international hot events.Storyline Extraction mainly studies how to extract hot events from the corpus,reveal the evolution of relevant events over time and use structure information to represent events,such as people,location,organization,key-word and relevant topic.It has dramatically practical significance and great application value to study the technology of storyline extraction in news articles.News articles have the features of real-time,continuous,high attention and high-quality of documents,which makes storyline extraction in news articles practicable and challenging.In addition,the performance of supervised method depends on the quantity and quality of the annotated corpus,and massive labeled data usually requires a lot of labor.So here we focus on unsupervised storyline extraction in news articles.Our main contributions are summarized as follows.1.This paper makes a research on storyline extraction in news articles and proposes the Dynamic Storyline Detection Model(DSDM).DSDM is an unsupervised Bayesian latent variable model,which use the storyline distribution of previous epochs as the priors of the current epoch to represent the dependency of storylines.This paper describes the DSDM model and parameter estimation method in detail.Two datasets are used to evaluate the effectiveness of DSDM with baselines.The Dataset ? is an unannotated one-month dataset consisting of 526,587 news articles and the Dataset ? is an annotated one-week dataset consisting of 101,654 news articles.The results of our model on two datasets both outperform the baselines.2.To handle problems of DSDM,that determining the number of storylines manually,high sampling complexity and low precision of key-words,topics of extracted storyline,This paper uses Chinese Restaurant Process(CRP)to determine the number of storylines automatically,Metropolis-Hastings sampler and LightLDA to reduce sampling complexity,add the word switch variable to improve the accuracy of key-word and topic,proposes the Dynamic Storyline extraction Model(DSEM).This paper describes the DSDM model and parameter estimation method in detail.For evaluate the effectiveness of model dealing with complicated storylines.The Dataset ? is a manually constructed dataset consisting of all types of storylines and 23,376 news articles.The accuracy is increased by 5.23%,2,50%,20.83%compared to DSDM on Dataset ?,Dataset ? and Dataset ?.3.To handle the problems of setting prior knowledge and construct dependency of stories belonging to the same storyline,this paper improves the original model from neural network view,use the assumption that the storyline distribution of documents and titles should be same and two output of neural network to represent the dependency of events,proposes the neural Dynamic Storyline extraction Model(Neural-DSEM).This paper describes the network structure and training procedure of Neural-DSEM model in detail.The accuracy is increased by 2.14%,2.78%compared to DSEM on Dataset ?,Dataset ?.This paper consists of four chapters.The first chapter introduces the research background and significance,the motivation and the main research content.The second chapter describes the related theories and existing technologies of storyline extraction on news articles.The third chapter introduces the proposed approach based on DSDM related experiment.The fourth chapter introduces the proposed approach based on DSEM related experiment.The fifth chapter introduces the proposed approach based on Neural-DSEM related experiment.The sixth chapter is the summary and future outline of this work.

Keywords/Search Tags:

Storyline extraction in news articles, Chinese Restaurant Process, Graphic model, Bayesian model, Neural Network

PDF Full Text Request

Related items

1	A Study On The Analytical Method Of Chinese And Vietnamese Bilingual News
2	Storyline Extraction Based On Deep Learning For News Articles
3	Crowd Scene Analysis Based On Distance Dependent Nonparametric Bayesian Model
4	Min-hash Sketch Construction Via Nonparametric Clustering
5	Chinese Multi-Document Summarization Based On Hlda Hierarchical Topic Model
6	Research On Storyline Mining Based On Weibo
7	Research On The Extraction And Sentiment Classification Of Chinese-Vietnamese Cross-language Comparable News Opinion Sentences
8	Research On Text Deep Analysis Based Storyline Generation
9	Research On News Text Classification Based On Convolutional Neural Network
10	Research On Chinese News Incident Extraction Method