Font Size: a A A

Text document topical recursive clustering and automatic labeling of a hierarchy of document clusters

Posted on:2013-02-25Degree:M.SType:Thesis
University:University of Alberta (Canada)Candidate:Li, XiaoxiaoFull Text:PDF
GTID:2458390008978274Subject:Computer Science
Abstract/Summary:
The overwhelming amount of textual documents currently available highlights the need for information organization and discovery. Effectively organizing documents into a hierarchy of topics and subtopics makes it easier for users to browse the documents.;This thesis borrows community mining techniques from social network analysis to generate a hierarchy of topically coherent document clusters. It focuses on giving the document clusters descriptive labels. We propose to use different centrality measures in networks of co-occurring terms to label the document clusters. We also incorporate keyphrase extraction and automatic titling in cluster labeling. The results show that the cluster labeling method utilizing KEA to extract keyphrases from the documents generates the best labels overall comparing to other methods and baselines. We also built an interactive browsing web interface for users to examine the taxonomies.
Keywords/Search Tags:Document, Labeling, Hierarchy
Related items