Font Size: a A A

Research On Real-time Rumour Detection For Social Media Streams

Posted on:2018-03-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y M QinFull Text:PDF
GTID:1368330545499888Subject:Digital media
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet techniques,Social Media is becoming a vital part of our daily life.The popularity of Social Media makes it evolved from a simple user relationship based communication tool to a major source of news acquiring channel.In Social Media,the agencies of news wires are various,anyone can be a news producer,communicator,and consumer.In comparison with Traditonal Media,Social Media is more vigorous.It breaks the traditional boundaries of news communication and distribution,and enables the spreading of breaking news to cross country,indu,stry,community,etc.Due to the increasingly growing number of users on Social Media,the data stream it produces daily is very big.This makes it differs from Online News Media and becoming a hot topic in this research area.Newswire and Social Media are the major sources of information in our time.While the topical demographic of Western Media was subjects of studies in the past,less is known about Chinese Media.Because of the complexity of Chinese language,variety of nations within China,and large population makes the Chinese Social Media and news communication differs from the western world.Given the rising international status,study of the coverage between western and Chinese Social Media and Traditional Media is becoming an interesting and important topic.Although Social Media has renovated the means of news communications and facilitated real-time information sharing,it also provided a breeding ground for rapid rumour spreading.The ease of information sharing and communicating made it hard to analyse the credibility of a post at time of its publication,thus further affects users'understanding and judgment of a certain topic.Hence,rumour detection on Social Media is developing into a hot research direction.However,rumour detection is hard because the most accurate systems operate retrospectively,only recognising rumours once they have collected repeated signals.By then the rumours might have already spread and caused harm.Moreover,with the rapid growing numbers of users on Social Media,the data stream it produces daily is becoming incredibly huge.We found out that with the growing number of documents the First Story Detection(FSD)systems processed,the performance and accuracy of the FSD systems is also facing big challenges.State-of-art FSD systems focused on improving the accuracy of detection algorithm,but neglected the growing space saturation.As a result,fixed spaces are filled which causes the accumulative average novelty score to decay over time.We also explained that the novelty decay negatively affect the system's accuracy.Given the above,we have made our focus on the following three aspects:Firstly,we apply event detection and tracking technology to examine the information overlap and differences between Chinese and Western-Online News Media and Social Media.Our experiments reveal a biased interest of China towards the West,which becomes particularly apparent when comparing the interest in celebrities.Secondly,we introduce a new category of features based on textual entailment,tailored to detect rumours early on.To compensate for the absence of repeated signals,we make use of newswire as an additional data source.We designed a meomory based textual entailment algorithm-Kterm Entailment,to compute for each tweet their entailment score with respect to the additional data source.Unconfirmed information with respect to the additional data sources such as news articles is considered as an indication of rumours.Additionally,we introduce pseudo feedback,which assumes that documents that are similar to previous rumours are more likely to also be a rumour.In comparison with other real-time approaches,result shows that novelty based features in conjunction with pseudo feedback perform significantly better,when detecting rumours instantly after their publication.Finally,in this thesis we also explore the impact of processing unbounded data streams on First Story Detection(FSD)accuracy.We discovered the problem of novelty decay and its corresponding decay model,we also tested the impact of FSD accuracy when applying decay model.In particular,we study three different types of FSD algorithms:comparison-based,LSH-based and k-term based FSD.Our experiments reveal for the first time that the novelty score of all three algorithms decay over time.We explain why the decay is linked to the increased space saturation and negatively affects detection accuracy.We provide a mathematical decay model,which allows compensating observed novelty scores by their expected decay.Our experiments show significantly increased performance when counteracting the novelty score decay.One of our main contributions-real-time rumour detection on Socical Media,is based on the state-of-art big data processing and applied machine learning technologies.Our technology of real-time rumour detection is out of innovative thinking and is an advance in this research area.Our technique helps to catch rumour at its publication and thus helps to prevent rumour to further spread and cause harm.Additionally,our technique also provides the bases of monitoring and analyzing public opinion on Social Media.Another of our main contribution is to model the novelty decay pattern via a mathematic model.Our foundings are the first to reveal the novelty decay in FSD systems.Our proposed decay model helps to maintain a reasonable accuracy for all topic detection and tracking systems while working with unbounded data stream.Our research and foundings help to establish the foundation for public opinion analysis in the new era of big data.
Keywords/Search Tags:Social Media, Rumour Detection on Social Media, Text Mining, Textual Entailment, and Novelty Decay
PDF Full Text Request
Related items