Font Size: a A A

Research And Implementation Of Automatic Detection And Summarization For News Events In Microblog Based On Heterogeneous Network

Posted on:2018-10-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y ZhangFull Text:PDF
GTID:2348330515468699Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of social media,the Microblogs platform plays an important role in the spreading of the news.However,due to the large-scale volume,real-time response and overwhelming unstructured data,the state-of-the-art data-mining methods are no longer useful to detect and summarize the news event.In this thesis,we proposed a novel event detection and summarization framework,named HRDBSCAN,which can simultaneously utilize the text,visual,time and social information to grasp the essential part of tweets and overcome the shortcoming of traditional methods.In the data preprocessing stage,a strict filtering paradigm is defined to remove the meaningless tweets and pictures.In the event detection stage,the heterogeneous information network is put forward to model heterogeneous information from tweets.We use the late multi-modal fusion strategies to combine the heterogeneous characteristics of Twitter data.After the fusion operation,a homogeneous graph is generated to improve the diversity and reducing the topic segmentation.Finally,all the clusters are sorted based on their novelty and popularity,so do tweets in the cluster.We choose the top n of sorted tweets and pictures as the summarization of event.The main contributions are listed as follows:1)To resolve the issue that the traditional method cannot fully exploit rich additional information,the dynamic heterogeneous information network is proposed to handle the cross-modal information.Considering the semantic similarity and space-time proximity of news events,the AFF function was conducted to integrate the multi-modal features.Moreover,to preserve the critical structure,we transform the heterogeneous information network to a homogeneous graph which can be used to detect and summarize the subsequent event.3)For the purpose of improving the diversity and reducing the topic segmentation,at the clustering stage,the HRDBSCAN algorithm is proposed to use statistical methods to merge similar clusters,at the summarization stage,a re-cluster method is exploited to make sure every sub-topic appears only once.4)The extensive experiments conducted on two real-world microblog datasets demonstrate the novelty and superiority of our proposed framework.
Keywords/Search Tags:Heterogeneous information network, Cross-modal fusion, Event detection, Event summarization
PDF Full Text Request
Related items