Building a test collection for significant-event detection in Arabic tweets

Posted on:2017-02-14

Degree:M.S

Type:Thesis

University:Qatar University (Qatar)

Candidate:Almerekhi, Hind Ali

Full Text:PDF

GTID:2448390005964953

Subject:Computer Science

Abstract/Summary:

With the increasing popularity of microblogging services like Twitter, researchers discovered a rich medium for tackling real-life problems like event detection. However, event detection in Twitter is often obstructed by the lack of public evaluation mechanisms such as test collections (set of tweets, labels, and queries to measure the effectiveness of an information retrieval system). The problem is more evident when non-English languages, e.g., Arabic, are concerned. With the recent surge of significant events in the Arab world, news agencies and decision makers rely on Twitters microblogging service to obtain recent information on events. In this thesis, we address the problem of building a test collection of Arabic tweets (named EveTAR) for the task of event detection.;To build EveTAR, we first adopted an adequate definition of an event, which is a significant occurrence that takes place at a certain time. An occurrence is significant if there are news articles about it. We collected Arabic tweets using Twitter's streaming API. Then, we identified a set of events from the Arabic data collection using Wikipedias current events portal. Corresponding tweets were extracted by querying the Arabic data collection with a set of manually-constructed queries. To obtain relevance judgments for those tweets, we leveraged CrowdFlower's crowdsourcing platform.;Over a period of 4 weeks, we crawled over 590M tweets, from which we identified 66 events that cover 8 different categories and gathered more than 134k relevance judgments. Each event contains an average of 779 relevant tweets. Over all events, we got an average Kappa of 0.6, which is a substantially acceptable value. EveTAR was used to evaluate three state-of-the-art event detection algorithms. The best performing algorithms achieved 0.60 in F1 measure and 0.80 in both precision and recall. We plan to make our test collection available for research, including events description, manually-crafted queries to extract potentially-relevant tweets, and all judgments per tweet. EveTAR is the rst Arabic test collection built from scratch for the task of event detection. Additionally, we show in our experiments that it supports other tasks like ad-hoc search.

Keywords/Search Tags:

Event, Test collection, Tweets, Arabic

Related items

1	Research On Query Expansion Of Twitter Data Information
2	Arcana: Private tweets on a public microblog platfor
3	Deep Learning for Sentiment and Emotion Detection in Multilingual Context
4	The Design And Implementation Of Hot Topic Detection System Of Tweets Based On Spark On Yarn
5	An Arabic lexicon to support information retrieval, parsing, and text generation
6	Handwritten word recognition: Application to Arabic cheque processing
7	Studies On Off-line Handwritting Arabic Characters Recognition Key Technology
8	University Library Facing Readers' Needs Research On The Content Of Wechat Tweets
9	The Marcellus Shale in Maryland and Twitter: A Mixed Methods Analysis of Tweets from November 201
10	Research And Design Of Event Synthesis Technology In Drive Test For Wireless Network