Research Of Semantic Relatedness Measure Based On Wikipedia Structure

Posted on:2013-03-29

Degree:Master

Type:Thesis

Country:China

Candidate:C C Sun

Full Text:PDF

GTID:2298330467478167

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As WEB2.0comes out and develops very fast, quite an amount of WEB information is produced and spreads. People hope that they get information from computers very soon, which is important to them. People hope that computers can mine information automatically and intelligently and can understand and deal with natural languages well. Semantic relatedness between words and phrases is very important to these applications of computers. As a fundamental field of research, semantic relatedness is popular among information information retrieval, spelling check, text classification, text clustering, artificial intelligence, natural language process related application such as word sense disambiguation, automatic summary, intelligent answer and machine translation and so on.Itâ€™s a tough and complicated task for computers to judge the semantic relatedness between words, which needs many concepts and relationships between them of entities in real world, common senses and knowledge about special fields. Some researchers use statistical analysis of large corpora to compute semantic relatedness while others deal with knowledge bases and get lexical structures such as taxonomies and thesauri to compute semantic relatedness. However, both are limited by background knowledge; the former is bad structures and imprecise, and scalability and scope limit the latter.Wikipedia is an excellent semantic knowledge base, consisting of the article referenced network and the category tree, which are two structures like networks, with quite amounts of explicit semantic knowledge in good structures. To compute semantic relatedness between words or phrases, at first, we map the target words to wiki-concepts, which will be defined in chapter3; then, we compute semantic relatedness between wiki-concepts to get semantic relatedness between the target words. The main contributions and innovations of the thesis are as follows:1) We introduce background information, current developments and defects of research on semantic relatedness computing. The definition of semantic relatedness and its evaluation measures are stated. Traditional semantic relatedness algorithms are introduced and their advantages and disadvantages are analyzed.2) A simple semantic relatedness algorithm named RelArtNetSimple is proposed based on the wikipedia article referenced network and Jaccard coefficient; then, wiki-concept nodes and links get weights and wiki-concepts are divided by layers; finally, a new semantic relatedness algorithm named RelArtNet comes out, which bases on hierarchically divided wiki-concepts with weights in the wikipedia article referenced network.3) We propose a semantic relatedness algorithm based on content of the category tree and also a semantic relatedness algorithm based on the structure of the category tree. A new semantic relatedness algorithm named RelCatTree comes out, based on the wikipedia category tree, with both advantages of the former two algorithms.4) Correlations between humansâ€™ judgments and algorithmsâ€™ results are used to comment semantic relatedness algorithms. Spearman coefficient is applied to get correlations between target algorithms and humansâ€™ judgments. Three popular testing sets are used, which are Miller and Charles (1991, consisting of30pairs), Rubenstein and Goodenoughâ€™s (1965, consisting of65pairs) and WordSim-353datasets (Finkelstein et al.,2002, consisting of353pairs). The experiments results proves good complexity of the WSR algorithm we proposed.

Keywords/Search Tags:

PDF Full Text Request

Related items

1	Research And Implementation On Computing Semantic Relatedness Using Chinese Wikipedia
2	Research Towards Web Classification Based On Wikipedia Category Network And URL Pattern Tree
3	Research On Concept And Short Text Semantic Relatedness Calculation Method
4	Extracting Structured Information From The Chinese Wikipedia And Measuring Relatedness Between Words
5	Wikipedia-based Semantic Comparison
6	Term Relatedness from Wiki-Based Resources Using Sourced PageRank
7	Wikipedia Based Conceptual Graph Model And Its Application
8	Research And Implementation Of The Knowledge Search System Based On Wikipedia
9	Parse Tree Based Neural Networks For Semantic Relatedness
10	Research On Semantic Enhancing Relational Similarity Measurement