Using graphs and random walks for discovering latent semantic relationships in text

Posted on:2008-06-21

Degree:Ph.D

Type:Dissertation

University:University of Michigan

Candidate:Erkan, Gunes

Full Text:PDF

GTID:1448390005452132

Subject:Computer Science

Abstract/Summary:

We propose a graph-based representation of text collections where the nodes are textual units such as sentences or documents, and the edges represent the pairwise similarity function between these units. We show how random walks on such a graph can give us better approximations for the latent similarities between two natural language strings. We also derive algorithms based on random walk models to rank the nodes in a text similarity graph to address the text summarization problem in information retrieval. The similarity functions used in the graphs are intentionally chosen to be very simple and language-independent to make our methods as generic as possible, and to show that significant improvements can be achieved even by starting with such similarity functions. We put special emphasis on language modeling-based similarity functions since we use them for the first time on problems such as document clustering and classification, and get improved results compared to the classical similarity functions such as cosine. Our graph-based methods are applicable to a diverse set of problems including generic and focused summarization, document clustering, and text classification. The text summarization system we have developed has ranked as one of the top systems in Document Understanding Conferences over the past few years. In document clustering and classification, using language modeling functions performs consistently better than using the classical cosine measure reaching as high as 25% improvement in accuracy. Random walks on the similarity graph achieve additional significant improvements on top of this. We also revisit the nearest neighbor text classification methods and derive semi-supervised versions by using random walks that rival the state-of-the-art classification algorithms such as Suppor Vector Machines.

Keywords/Search Tags:

Random walks, Text, Using, Graph, Classification, Similarity functions, Document

Related items

1	Research On The Intelligent Random Walks On Graph For Classification Algorithms And Applications
2	Research On Large Scale Graph Analytic System For Supporting Fast Random Walks
3	Research On Document Retrieval Technology Based On Graph
4	Parallel Implementation On Document Classification And Similarity Analysis
5	Research On Semantic Similarity Computation And Applications
6	Research On Chinese Text Classification Based On Semantic Analysis
7	Research On Document Classification Method Based On Graph Modal
8	Research On Classification Algorithms Based On Enhanced Semantics And Random Walks
9	Research On Standards Conformance Testing Of Traffic Information And Its System Development
10	Text Classification Based On Graph Convolutional Neural Network With Intimacy Matrix And Text Linking