Research And Implementation On Computing Semantic Relatedness Using Chinese Wikipedia

Posted on:2012-12-19

Degree:Master

Type:Thesis

Country:China

Candidate:X Wang

Full Text:PDF

GTID:2218330362460221

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Computing semantic relatedness is one of the most important problems in Natural Language Processing (NLP) field, which also plays a critical role in many NLP applications, such as information retrieval, text classification, word sense disambiguation, example-based machine translation. Because of the particularity of Chinese and some other reasons, the research of computing semantic relatedness in Chinese is much behind of the research of English. In order to improve the relative NLP technology, the research of computing semantic relatedness in Chinese is of great worth.This paper mainly studies algorithms of computing semantic relatedness using Chinese wikipedia links and taxonomy. First, this paper introduces research background and related research methods of computing semantic relatedness in order to understand this research area better. Second, this paper applies algorithms which are based on tree taxonomy like WordNet to Chinese Wikipedia. Because Wikipedia taxonomy is a directed acyclic graph rather than a tree, we propose a multi-path semantic reletedness algorithm. Third, this paper applies WLM (Wikipedia Link-based Measure) algorithm to Chinese Wikipedia and proposes WLT (Wikipedia Links and Taxonomy based measure) algorithm using wikipedia links and taxonomy. We combined algorithms based on taxonomy and WLM or WLT. The experimental results show that the combined algorithms are better than algorithms only based on Wikipedia links or on Wikipedia taxonomy. Finally, the semantic relatdeness algoritms based on Wikipedia are used in the YHPODS system: The first is topic keyword association and the second is semantic-based classification.In addition, we build a manual evaluated test collection named Words-240 to evaluate the accuracy of semantic relatedness algorithms. Because of large amount of data in Wikipedia, we proposed some methods such as using memory cache and file cache, optimizing the database tables and building a database connection pool to imporve efficiency of the algorithms. Taken advantage of these measures, the time consumed by the algorithm is decreased by dozens of times.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Mining Semantic Knowledge From Chinese Wikipedia
2	Text Semantic Similarity Algorithm Based On Transformer
3	Investigation Of Categorical Semantic Information Processing In The Brain And Natural Language Processing Models
4	Semantic Annotation For Documents In Professional Domain Based On NLP
5	Study On Concept Semantic Similarity Measure Based On Ontology
6	Chinese Words Semantic Similarity Measure Research Based On Common Sense Knowledge Base
7	The Research Of Law Support System Based On Semantic Computing
8	Crowdsourcing For Synonyms Proofreading And Acquisition In Chinese Large-scale Semantic Knowledge Base
9	A Study On Neural Network-based Natural Language Semantic Representation
10	The Representation Of Chinese Semantic Knowledge And Its Application In The Chinese-English MT System