Term Relatedness from Wiki-Based Resources Using Sourced PageRank

Posted on:2011-01-21

Degree:Ph.D

Type:Dissertation

University:The Ohio State University

Candidate:Weale, Timothy Fitzgerald

Full Text:PDF

GTID:1448390002957420

Subject:Engineering

Abstract/Summary:

This dissertation concerns itself with creating a new algorithm for automatically measuring the amount of relatedness between a given pair of terms. Research into term relatedness is important because it has been empirically demonstrated that using relatedness metrics can improve the performance of tasks in Natural Language Processing and Information Retrieval by expanding the usable vocabulary. Previous relatedness metrics have used a variety of sources of semantic data to judge term relatedness, including text corpora, expertly-constructed resources and, most recently, Wikipedia and Wiktionary. The primary focus of this dissertation is the creation of a new metric for deriving term relatedness from the graph structure of Wikipedia and Wiktionary using Sourced PageRank, a modified version of the PageRank algorithm, to generate the relatedness values.;This new algorithm is compared to several existing relatedness metrics in two established task domains. The first domain measures the metric's ability to replicate human-generated relatedness values for term pairs. The second domain tests a metric's ability to select the synonym of a given term from a list of possible candidates. In both of these experiments, the Sourced PageRank-based term relatedness algorithm that uses Wiktionary as its source of semantic data is able to compete with or exceed the performance of existing state-of-the-art algorithms in these task domains.;Additionally, the different emphases of Wikipedia and Wiktionary are covered as part of this dissertation. This is an area that has not been emphasized in past work with Wiki-based relatedness metrics. We find that Wikipedia is a source of information on proper names and their real-world referents, including corporations, events and people. Wiktionary has more information on common words that almost everyone knows. Each Wiki-based resource has its own strength and must be matched with the needs of the task in order to yield maximum benefits.;Finally, we explore how to use additional information found in Wikipedia and Wiktionary as metadata for graph manipulation. While we achieve mixed results, the investigation opens another area of research for graph-based relatedness metrics that use Wikipedia or Wiktionary as the source of semantic data.

Keywords/Search Tags:

Related items

1	Research Of Semantic Relatedness Measure Based On Wikipedia Structure
2	Research And Implementation On Computing Semantic Relatedness Using Chinese Wikipedia
3	Research On Concept And Short Text Semantic Relatedness Calculation Method
4	Extracting Structured Information From The Chinese Wikipedia And Measuring Relatedness Between Words
5	Semantic Relatedness Algorithm Design Between Named Entities Based On Linked Open Data
6	A Semantic-Wiki Knowlege Base System Based On Knowledge Elements
7	Wikipedia Based Conceptual Graph Model And Its Application
8	Research And Implementation Of The Knowledge Search System Based On Wikipedia
9	Enabling Entity Retrieval by Exploiting Wikipedia as a Semantic Knowledge Source
10	Research On Semantic Enhancing Relational Similarity Measurement