Font Size: a A A

Extracting Semantic Information from Wikipedia Using Human Computation and Dimensionality Reduction

Posted on:2011-12-04Degree:M.ScType:Thesis
University:McGill University (Canada)Candidate:West, RobertFull Text:PDF
GTID:2448390002957315Subject:Computer Science
Abstract/Summary:
Semantic background knowledge is crucial for many intelligent applications. A classical way to represent such knowledge is through semantic networks. Wikipedia's hyperlink graph can be considered a primitive semantic network, since the links it contains usually correspond to semantic relationships between the articles they connect. However, Wikipedia is rather noisy in this function. We propose Wikispeedia, an online human-computation game that can effectively filter this noise, furnishing data that can be leveraged to define a robust measure of semantic relatedness between concepts. While the resulting measure is very precise, it has the limitation of being sparse, i.e., undefined for many pairs of concepts. Therefore, we develop algorithms based on principal component analysis to increase coverage to the set of all pairs of Wikipedia concepts. These methods can also be generalized to other sparse measures of semantic relatedness, which we demonstrate by applying our approach to the Wikipedia adjacency matrix. Building on the same techniques, we finally propose an algorithm for finding missing hyperlinks in Wikipedia, which results in increased human usability.
Keywords/Search Tags:Semantic, Wikipedia
Related items