Font Size: a A A

Quality evaluation of topics identification algorithms

Posted on:2014-03-09Degree:M.ScType:Thesis
University:Royal Military College of Canada (Canada)Candidate:Decarie, Francois Andre MartinFull Text:PDF
GTID:2458390008460143Subject:Computer Science
Abstract/Summary:
The need for effective text retrieval tools, such as search engines, is omnipresent in the corporate marketplace and defence industry alike. The task of indexing large quantities of text from various sources, such as news and social media is too enormous to be accomplished by humans alone. Automatically identifying keywords, or topics, from unstructured text is an important challenge. Extensive computational experiments were conducted using topic identification methods: the Retrieval Activation and Decay (ReAD) algorithm, the Priming Activation Indexing (PAI) algorithm and the Term Frequency- Inverse Document Frequency (TFIDF) method. These experiments were conducted with a subset of the well known Reuters financial dataset. The computational experiments were conducted to identify the parameters that would return higher quality topics using several well known topics quality evaluation methods: the Fl, the precision, the recall and the Normalized Mutual Information (NMI) measures. Two novel evaluation measures were also proposed: Simple Match Five (SM5) and Expanded Match Five (EM5). The results were generated using the parameters that would return high quality topics according to different computational measures. An online survey with volunteer evaluators was conducted in order to validate these results. The parameters that yielded higher topic qualities were inconsistent from one type of measurement to the next. For the chosen parameters, it was found that TFIDF produced higher quality topics than PAI, and PAI produced higher quality topics than ReAD when submitted to human evaluations. It was found that neither the proposed measures nor the established Fl measure were adequate indicators of topic quality.;Keywords: Topics Identification, Topics Evaluation, Topics Quality.
Keywords/Search Tags:Topics, Quality, Evaluation, Identification, Experiments were conducted, Measures
Related items