Graph-based Algorithms for Keyphrase Extraction in Social Text

Posted on:2015-08-19

Degree:Ph.D

Type:Dissertation

University:The George Washington University

Candidate:Al-Dhelaan, Mohammed

Full Text:PDF

GTID:1478390017993639

Subject:Computer Science

Abstract/Summary:

The sheer volume of text in the web mandates automated approaches for identifying keyphrases that distinguish documents and help recognize important topics within documents. Automatic extraction of keyphrases is accomplished by designing algorithms capable of quantifying saliency in text. Measuring saliency score of textual units has traditionally used bag-of-words (BOW) approaches where the ranking is measured without considering the context. Such approach has several limitations as in polysemy and synonymy, where it is hard to detect these natural characteristics of text without understanding the context. In contrast, graph-based approaches model the relation between textual units that alleviate the aforementioned problems. In this dissertation, we introduce a collection of novel approaches for graph-based keyphrase ranking.;First, we propose a novel random walk extension for graph-based ranking that can leverage weights on both vertices and edges, called NE-Rank. The ranking algorithm combines additional ranking methods that enhance existing graph-based approaches. Specifically, we combine a discriminative ranking approach as in tf-idf to the co-occurrence ranking in graphs. Moreover, the ranking model uses social tags, as in Twitter's hashtags, and explores leveraging them by boosting their weights for the task of keyphrase extraction in social microposts. Additionally, we propose a lexical graph expansion through social tags for keyphrase extraction. After modeling the textual content of microposts in a lexical graph, we expand the graph by finding more similar content linked by tags. We show a number of different approaches to lexical graph expansion through Twitter hashtags, and show a significant improvement over using the textual content alone.;Second, we propose a new approach for measuring saliency in short documents. We model the textual units in a hypergraph by modeling words as vertices and short documents as hyperedges, and we study a high-order co-occurrence relation that is beyond the pair-wise relation in graphs. Therefore, we propose a novel probabilistic random walk over hypergraphs that captures weights on vertices and hyperedges to rank vertices. We compare our proposed random walk with different random walk approaches for hypergraphs and show the validity of the approach. Finally, we propose a complete ranking framework for extracting keyphrases from short documents using the hypergraph proposed random walk. The ranking takes into account temporal and social attributes that are important for a dynamic genre such as Twitter.

Keywords/Search Tags:

Social, Keyphrase, Ranking, Text, Graph-based, Approaches, Random walk, Documents

Related items

1	Key Technology Research On Personalized Ranking Based On Random Walk In Social Networks
2	The Study Of Ranking And Recommendation Algorithms Based On Heterogeneous Graph
3	Research On Co-Ranking Algorithm In Web Pages Based On Random Walk
4	Research Of Social Distance-aware Bayesian Personalized Ranking Recommendation Algorithm
5	Research On Graph-based Keyphrase Extraction Method Integrating Multiple Features
6	Algorithmic Aspects of Social Network Mining
7	Research And Application Of Random Walk Based Recommendation Techniques
8	Development Of Random Walk Algorithm Acceleration System Based On FPGA
9	A Random Walk Based Win/Loss Graph Aggregation Algorithm For News Metasearch Engine
10	Heterogeneous Networks Based Random Walk Without Meta-paths For Social Recommendation