| Wikification, which stands for the process of linking terms in a plain textdocument to Wikipedia articles which represent the correct meanings of theterms, can be thought of as a generalized Word Sense Disambiguationproblem. It disambiguates multi-word expressions (MWEs) in addition tosingle words. Existing wikification techniques either model the context of agiven term as well as the Wikipedia article as bags of words, then computethe similarity between context and sense, or compute global constraintsamong Wikipedia concepts by the link graph or link distributions. The firstmethod doesn’t achieve good results because the MWEs can have verydifferent meanings than its constituent words which themselves areambiguous. The second method doesn’t produce high accuracy because thelink structure or link distribution is often biased or incomplete by themselvesdue to the fact that Wikipedia pages are often sparsely linked. In this thesis,we present a simple but powerful framework of sense disambiguation usingco-occurrences of Wikipedia links in the Wikipedia corpus. We propose aniterative method to enrich the sparsely-linked articles by adding more links and then use the resulting link co-occurrence matrix to disambiguate an inputdocument by a sliding window algorithm. Our prototype system achieves89.97%precision and76.43%recall on average for three benchmark data andcompares favorably against four state-of-the-art wikification techniques. |